Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep Tom Walsh, PhD Division of Medical Genetics University of Washington
Next generation sequencing Sanger sequencing gold standard for over 30 years Next Generation Sequencing is massively parallel Millions of short reads, each 50-100bp 1,000-10,000 fold more sequence data Very low cost per base
Target enrichment Current sequence capacity enables whole genome sequencing (3000Mb) or whole exome coding regions (40Mb) Target enrichment allows specific capture and sequencing of only the genes associated with a particular disease/phenotype Smaller sequencing target = reduced cost/higher sample throughput Applying target enrichment and sequencing to the detection of mutations that predispose women to developing breast and ovarian cancer
BRCA1 and BRCA2 Inherited mutations in BRCA1 and BRCA2 predispose to high risks of developing breast and ovarian cancer Clinical recommendations for women with BRCA1 and BRCA2 mutations include increased surveillance and risk reducing surgical removal of the ovaries and fallopian tubes after child-bearing is complete Advent of PARP inhibitors, which preferentially kill BRCA1 and BRCA2 mutated cancers, has increased the clinical incentive to identify mutation carriers
Family 1. BRCA1 c.2800 AA Family 1. BRCA1: 2800 AA 6 2 Pr 79 Pr 77 Br 59 Br 57 94 92 V N N N V N N N V N N N 91 V N Br 74 Br 32 Br 45 Br 36 Ov 48 60 Pr 57 Ov 61 N N N N V N V N N N V N N N V N N N V N N N 82 N N 81 V N N N 80 N N 79 N N 78 N N Pa 66 V N N N Br 54 N N 74 V N N N Co 54 V N N N Br 29 V N 58 N N Br 28 V N N N Br 45 N N 39 V N 58 54 Br 27 Br 41 45 Br 39 Br 33 44 Br 34 Br 49 Br 29, 39 V N V N V N N N V N V N V N V N N N V N V N V N Br 25 V N Br 27 V N Two hits: Inherited mutation + somatic loss of wildtype allele Somatic mutation generally chromosomal deletion
Family 16. BRCA2 c.1310 AAGA Family 16 BRCA2 1529 del AAAG -> 456 stop Pa 73 80 Br 65 72 79 71 Pr 72 81 Es 68 VN NN VN NN VN NN VN 63 NN Br 72 73 82 Br 65 Br 36 71 Br 66 72 VN NN NN VN NN VN NN 68 Br 66 80 77 72 86 VN VN NN NN VN VN Br 35 Br 32 51 45 48 VN VN VN VN NN
NHGRI, Breast Cancer Information Core BRCA1 Large genes, each with >1000 different cancer-predisposing mutations BRCA2
Mutation spectrum also includes large deletions and duplications not detectable by PCR
Genetic testing of BRCA1 and BRCA2 In the U.S., testing is carried out almost exclusively by Myriad Protocol is based on PCR amplification of individual exons followed by Sanger sequencing on capillary instruments Large deletions and duplications are detected by a second test (BART added in 2007) which measures copy number at exons
Genetic testing of BRCA1 and BRCA2 In the U.S., testing is carried out almost exclusively by Myriad Protocol is based on PCR amplification of individual exons followed by Sanger sequencing on capillary instruments Large deletions and duplications are detected by a second test (BART added in 2007) which measures copy number at exons Our goal: develop a comprehensive next generation sequencing approach for research testing of all breast cancer susceptibility genes
1. Rare multi organ cancer syndromes Li-Fraumeni : sarcomas, leukemias, breast p53 Cowden: thyroid, endometrial, breast PTEN Diffuse gastric cancer: gastric and breast CDH1 Peutz-Jeughers : colon and breast STK11 Lynch: colon, endometrial, ovarian Mismatch Repair genes
2. Moderate risk breast cancer genes BRCA-Fanconi Anemia complex ATM p53 Mutations in 9 genes lead to 2-4 fold increased risk of developing breast cancer Ub FANCD2 BARD1 BRCA1 P BRIP1 P P CHEK2 Lower risk than BRCA1 and BRCA2 but still >25% lifetime risk PTEN 70 RAD51 BRCA2 NBS1 PALB2 RAD51C Clinically relevant level of risk MRE11 RAD50
Capturing 21 breast cancer genes Capture exons, introns, untranslated regions and 10kb up/downstream Total capture size = 939kb High risk Moderate risk Rare syndromes Lynch syndrome BRCA1 PALB2 p53 MLH1 BRCA2 CHEK2 CDH1 MSH2 BRIP1 PTEN PMS1 NBS1 STK11 PMS2 RAD50 MUTYH MRE11 ATM RAD50/51C
Capture design In solution capture with crna 120mer oligo baits (SureSelect) Repeat masked but allow 20bp overlap where exons are closely flanked by Alu repeats (BRCA1) crna baits (3x tiling) BRCA1 Repeat Tile through segmentally duplicated genes (CHEK2, PMS2, PTEN)
Developing a one stop genetic test Simultaneously capture and sequence 21 genes known to predispose to breast and/or ovarian cancer Detect all mutation classes Small: single base substitutions and indels Large: exon deletions and duplications Proof of principle: Test accuracy, sensitivity and specificity with 21 previously identified mutations from 10 genes
Capture and Sequencing Paired-end library (200bp) Hybridize to biotinylated capture bait oligos (21 gene regions) Purify with streptavidin beads Sonicate (3µg DNA) 2x76bp reads (9 days) Identify SNP and indels (MAQ and BWA) Compare to dbsnp, mutation databases Identify CNVs (depth of coverage)
Test series results - small mutations 15/15 small mutations from 10 different genes accurately identified Nonsense, splice site, missense and indels (1 to 19bp) Zero false positive calls of mutations in any gene or any sample
Test series results - small mutations 15/15 small mutations from 10 different genes accurately identified Nonsense, splice site, missense and indels (1 to 19bp) Zero false positive calls of mutations in any gene or any sample
Test series results - small mutations 15/15 small mutations from 10 different genes accurately identified Nonsense, splice site, missense and indels (1 to 19bp) Zero false positive calls of mutations in any gene or any sample
Mutation detection within duplicated regions Ratio of wildtype to mutation containing reads ~ 50/50 One exception: CHEK2 1100delC, approximately 15% mutant reads chr22:29,091, 857 segmental duplications Partial CHEK2 pseudogenes are located on chromosomes 15 and 16 4 extra copies of the target region reduces mutant to wildtype signal
Test series results - large mutations 6/6 large mutations in BRCA1 and BRCA2 were accurately identified by depth of coverage ratios normalized for bait coverage and GC content BRCA1 Deletion exons 14-20 Ratio 0.52 Deletion exons 17 0.49 Duplication exon 13 1.58 Deletion exons 1-15 0.51
Summary of proof of principle study DNA capture and sequencing is accurate and sensitive for detecting inherited mutations of clinically important genes Simultaneously evaluates all known breast and ovarian cancer genes Detects single base substitutions, indels and CNVs Accurate mutation detection in non unique regions of the genome
Increasing throughput by barcoding samples Sequence coverage is very high (1000x) with one sample per lane Barcoding and pooling samples reduces sequencing costs Hybridize individual samples to SureSelect baits then add unique 6bp barcoded primer after capture by PCR amplification Sequence barcode, demultiplex samples, analyze samples individually
Increasing throughput by barcoding samples Sequence coverage is very high (1000x) with one sample per lane Barcoding and pooling samples reduces sequencing costs Hybridize individual samples to SureSelect baits then add unique 6bp barcoded primer after capture by PCR amplification Sequence barcode, demultiplex samples, analyze samples individually Current throughput: 12 samples per lane, 96 per flow cell (GAIIx)
Multiplexing 96 barcoded samples per flow cell Median coverage is 350x 97% of targeted bases >100x minimum coverage
Data from multiplexing 96 barcoded samples Barcode Gene Mutation Within pool of 12 samples per lane Location (hg19) Wildtype Variant TGACCA BRCA1 4510del3insTT chr17:41,228,596-41,228,597 140 136 CAGATC BRCA1 5382insC chr17:41,228,596-41,228,597 89 80 TGACCA BRCA2 9179G>C chr13:32,953,650-32,953,650 212 201 GGCTAC BARD1 1210del21 chr2:215,645,503-215,645,524 145 112 CGATGT ATM 1027delGAAA chr2:215,645,503-215,645,524 132 145
Data from multiplexing 96 barcoded samples Barcode Gene Mutation Within pool of 12 samples per lane Location (hg19) Wildtype Variant TGACCA BRCA1 4510del3insTT chr17:41,228,596-41,228,597 140 136 CAGATC BRCA1 5382insC chr17:41,228,596-41,228,597 89 80 TGACCA BRCA2 9179G>C chr13:32,953,650-32,953,650 212 201 GGCTAC BARD1 1210del21 chr2:215,645,503-215,645,524 145 112 CGATGT ATM 1027delGAAA chr2:215,645,503-215,645,524 132 145 BRCA2
Data from multiplexing 96 barcoded samples Barcode Gene Mutation Within pool of 12 samples per lane Location (hg19) Wildtype Variant TGACCA BRCA1 4510del3insTT chr17:41,228,596-41,228,597 140 136 CAGATC BRCA1 5382insC chr17:41,228,596-41,228,597 89 80 TGACCA BRCA2 9179G>C chr13:32,953,650-32,953,650 212 201 GGCTAC BARD1 1210del21 chr2:215,645,503-215,645,524 145 112 CGATGT ATM 1027delGAAA chr2:215,645,503-215,645,524 132 145 BRCA2 100x coverage enables accurate detection of all mutation classes
Library prep is now the bottleneck Prep time for 96 sequence ready libraries is 3 weeks with 3 FTEs Most labor intensive part is magnetic bead (SPRI) clean ups Pre capture library prep (SPRI clean up x5) Capture hybridization x1 Post captures washes and amplification (SPRI clean up x2)
Increasing throughput by automation x96 x1 Individual sample handling
Increasing throughput by automation x96 Individual sample handling x1 96 sample handling
Increasing throughput by automation All liquid handling, enzymatic incubations and post capture washes are performed on the deck x96 96 well magnet allows plate based SPRI clean up x1
Increasing throughput by automation Protocols can be edited easily and elution volumes changed x96 x1 Incorporated and validated post capture off bead PCR amplification
Summary Sample throughput increased with barcoding and automation With standalone Bravo: 96 samples sequence ready library preps no longer bottleneck Reduced from 3 weeks with 3 FTEs to 3 days with 1 FTE Manual tip box replacement (not so bad) Complete walk away automated system (on wish list)
Ongoing projects Ovarian cancer sequenced 21 genes in 384 patients All libraries prepped on Bravo, sequenced on 4 flowcells (GAIIx) Breast cancer 1900 high risk families (KingLab collection) Running 96 samples in single lane of a HiSeq 96 post capture barcodes (early access from Agilent R+D) Testing Bravo for exome capture with NimbleGen EZ cap oligos
Acknowledgments Ming Lee, PhD Bioinformatics pipeline Alex Nord CNV and breakpoint algorithms Anne Thornton, Chris Pennil, Silvia Casadei, PhD library prep Mary-Claire King, PhD and Elizabeth Swisher, MD National Cancer Institute, Dept of Defense, Komen for the Cure