Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing PacBio Americas User Group Meeting Sample Prep Workshop June.27.2017 Tyson Clark, Ph.D. For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved.
AGENDA - Iso-Seq Method Updates -Overview of New Iso-Seq Method Workflow for Sequel Systems -Example Sequel System Iso-Seq Method Datasets -Size Selection Option for Iso-Seq Method on the Sequel System -Sequencing and Experimental Design Recommendations -Summary - Target Enrichment R&D Updates -Targeted Sequencing Using a CRISPR/Cas9-based Non-Amplification Method -Conclusion
Iso-Seq Method Updates
DETERMINATION OF TRANSCRIPT ISOFORMS Gene mrna isoforms Short-read technologies: PacBio s Iso-Seq Analysis solution: Insufficient Connectivity Splice Isoform Uncertainty Full-length cdna Sequence Reads Splice Isoform Certainty No Assembly Required Reads spanning splice junctions
Sequel System Iso-Seq Procedure Overview
EXISTING ISO-SEQ SAMPLE PREP WORKFLOW ON THE PACBIO RS II INVOLVES EXTENSIVE SIZE SELECTION Total RNA 1-2 2-3 3-6 5-10 Optional Poly-A Selection Re-Amplification PCR Optimization polya+ RNA Full Length 1 st Strand cdna Amplified cdna Reverse Transcription Large Scale Amplification 1-2 1-2 2-3 2-3 3-6 3-6 3-6 5-10 5-10 5-10 SMRTbell Template Preparation Optional Size Selection (BluePippin or SageELF) Size Selection (BluePippin, SageELF, or gel) SMRT Sequencing 1-2 2-3 3-6 5-10 Optional 5-10 size fraction
DECREASED LOADING BIAS IN SEQUEL SYSTEM REDUCES NEED FOR SIZE SELECTION Sequel SYSTEM output correlates with input SMRTbell library size distribution Full-length Transcript Sizes - Histogram plot of number of full-length sequences by transcript length for a Magbead-loaded, non-size selected Iso-Seq library sequenced on both the PacBio RS II and the Sequel System. BioAnalyzer trace of a non-size selected Iso-Seq Library - The full-length cdna sequences run on the Sequel System closely resemble the size distribution of the input SMRTbell library
NEW STREAMLINED ISO-SEQ WORKFLOW FOR SEQUEL
Example Sequel System Iso-Seq Method Datasets
EXAMPLE SEQUEL SYSTEM ISO-SEQ METHOD DATASET: PRIMARY SEQUENCING METRICS Sample [On- Plate] Total Gb Movie Pol RL (bp) Longest Subread 1x + 0.4x 50 pm 4.89 360 11766 2816 339905 (32.8%) 1x + 0.4x 40 pm 7.57 600 13845 2855 172510 (17%) 1x + 0.4x 50 pm 7.17 600 12015 2928 102676 (10%) P0 (%) P1 (%) P2 (%) 415685 (40.1%) 546423 (53%) 596453 (58%) 281207 (27.1%) 317867 (31%) 337671 (33%) - Target P1 ~50%, P0 10% - MagBead loading - Pre-extension = 120 mins - Polymerase Read Length increases with increasing movie time - Obtained 4.5 Gb for a 6-h movie and 7 Gb for a 10-h movie
EXAMPLE SEQUEL SYSTEM ISO-SEQ METHOD DATASET: SECONDARY ANALYSIS METRICS ( ISO-SEQ PROTOCOL) Sample [On-Plate] Movie #CCS CCS RL #FLNC (%) FLNC RL # Polished HQ Isoforms # Polished LQ Isoforms 1x + 0.4x 50 pm 360 415,539 2602 202,328 (48.7%) 1x + 0.4x 40 pm 600 545,724 2535 264,779 (48.5%) 1x + 0.4x 50 pm 600 595,533 2597 244,521 (41.1%) 2892 14,722 96,755 2867 20,386 132,119 3019 17,765 125,467-1 Sequel SMRT Cell typically yields 200 K Full-length non-chimeric (FLNC) reads - Number of FLNC and HQ isoforms drops with increased loading concentration (40 pm to 50 pm on-plate) even though the # CCS increases
Size Selection Option for Iso-Seq Method on the Sequel System
NEW STREAMLINED ISO-SEQ WORKFLOW FOR SEQUEL
EFFECT OF USING SIZE SELECTION OPTION WITH ISO-SEQ METHOD ON THE SEQUEL SYSTEM Non-size selected Non-size selected plus >4 size selected library (co-loaded) - BluePippin (or SageELF) size-selected library (4.5 10 ) can be pooled with non-size selected library and co-loaded together onto a single Sequel SMRT Cell Full-length Transcript Sizes
Transcript Count RECOMMENDATIONS FOR EXISTING SIZE-SELECTED FRACTIONS 0.5-2 1.5-3 2.5-6 4.5-10 Full-Length Transcript Size - Anneal and Bind size-selected fractions separately - Pool fractions by equimolar pooling (Make adjustments where necessary) - Data shown for 4 size bins pooled together and sequenced on a single Sequel SMRT Cell
AVAILABLE TECHNICAL RESOURCES FOR ISO-SEQ ANALYSIS Iso-Seq Best Practices NimbleGen Targeted Iso-Seq Barcoding Iso-Seq Iso-Seq on Sequel IDT Targeted Iso-Seq
SUMMARY - Prepare full-length transcripts using the Clontech SMARTer PCR cdna Synthesis Kit with as little as 1 ng of poly A+ RNA or 2 ng of total RNA - Sequel System loading protocols reduce need for size selection for transcripts <4 - Optional size-selection protocols to enrich for transcripts >4 - Survey transcriptomes in 1 2 SMRT Cells on the Sequel System - Increase sequencing depth for more comprehensive transcriptome characterization - Compatible with standard target enrichment methods, such as NimbleGen SeqCap EZ or IDT xgen Lockdown Probes - Multiplex transcripts or full transcriptomes with sample barcoding - Profile transcripts from multiplexed samples in a single Sequel SMRT Cell 1M - Data analysis protocols and tools available through SMRT Analysis and PacBio DevNet to generate high-quality, full-length transcript sequences with no assembly required - Run Iso-Seq analysis in either de novo (no genome reference required) or reference-based mode - Run Iso-Seq with Mapping analysis (map isoforms to GMAP) to enable studying gene families, gene fusion, accurate identification of unique isoforms.
Target Enrichment R&D Updates
Targeted SMRT Sequencing of Repeat- Expansion Disease Causative Genomic Regions Using a CRISPR/Cas9, Non-Amplification Based Method For Research Use Only. Not for use in diagnostics procedures. Copyright 2015 by Pacific Biosciences of California, Inc. All rights reserved.
REPEAT EXPANSION DISEASES 23
CRISPR/CAS9 SYSTEM Bacterial Adaptive Immunity RNA-guided DNA Endonuclease Some in vivo applications: - Gene silencing - Homology-directed repair - Transient gene silencing or transcriptional repression - Transient activation of endogenous genes - Transgenic animals and embryonic stem cells
PCR-FREE TARGET ENRICHMENT VIA CAS9 DIGESTION: METHOD OVERVIEW (CURRENTLY IN DEVELOPMENT)
USING CRISPR/CAS9 TO ENRICH FOR REPEAT EXPANSION DISORDERS * * *Restriction Enzyme Number of individual molecules sequenced - Improved on-target rate with complexity reduction: - Restriction enzymes are used to degrade unwanted SMRTbell templates - Additional starting DNA is required to maintain input into Cas9 digestion step - Multiplexing: - Multiple regions can be targeted in the same reaction - Patient samples could be barcoded during initial SMRTbell library preparation
COVERAGE ACROSS THE GENOME
RICARDO MOURO PINTO WILL BE DISCUSSING HUNTINGTON S DISEASE TOMORROW
FRAGILE X SYNDROME - Most common heritable form of cognitive impairment - Caused by expansion of a CGG trinucleotide repeat in the 5 UTR of the FMR1 gene fraxa.org
>700 CGG REPEATS SEQUENCED FROM THE FMR1 GENE
AGG INTERRUPTIONS REDUCE THE CHANCES OF PRE- TO FULL- MUTATION TRANSMISSION CGG CGG CGG CGG AGG CGG Difference in risk is greatest near 75-80 CGG repeats Having full sequence information is medically relevant Yrigollen et al. (2012) Genet Med 80% 60% 15% Maternal CGG repeat number 2 CGG CGG CGG CGG AGG CGG CGG CGG CGG CGG CGG CGG CGG CGG AGG CGG 1 CGG CGG CGG CGG AGG CGG CGG CGG CGG CGG CGG CGG CGG CGG CGG CGG 0 CGG CGG CGG CGG CGG CGG CGG CGG CGG CGG CGG CGG CGG CGG CGG CGG Yrigollen et al. (2012) Genet Med 14:729 736
DIRECT DETECTION OF METHYLATION
METHYLATION DETECTION OF FMR1 SAMPLE
METHYLATION DETECTION OF FMR1 SAMPLE CGG repeat region appears to be heavily methylated (5mC)
CONCLUSION Amplification-free enrichment with CRISPR/Cas9 and SMRT Sequencing achieves the base-level resolution required to understand the underlying biology of repeat expansion disorders - Target any hard-to-amplify genomic region regardless of sequence context - Avoid PCR bias and PCR errors - Accurately sequence through long repetitive and low-complexity regions - Count repeats and identify sequence interruptions - Detect and characterize epigenetic modification signals - Detect sample mosaicism
For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. All other trademarks are the sole property of their respective owners.