Genome mapping. Genome sequencing. Next Gen sequencing. Genome mapping. Genome sequencing Next Gen sequencing. YACs ~1 Mb.

Similar documents
Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

Genomic structural variation

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

Nature Biotechnology: doi: /nbt.1904

Phylogenomics. Antonis Rokas Department of Biological Sciences Vanderbilt University.

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

SUPPLEMENTARY INFORMATION

CNV Detection and Interpretation in Genomic Data

RNA- seq Introduc1on. Promises and pi7alls

Weird animal genomes and sex chromosome evolution

Next generation sequencing of the Salix purpurea genome and transcriptome: Tools for the genetic improvement of willow biomass crops

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Gene duplication and loss Part II

Recombina*on of Linked Genes: Crossing Over. discovered that genes can be linked. the linkage was incomplete

Global variation in copy number in the human genome

FONS Nové sekvenační technologie vklinickédiagnostice?

Structural Variation and Medical Genomics

BIOL2005 WORKSHEET 2008

Assignment 5: Integrative epigenomics analysis

AML Genomics 11/27/17. Normal neutrophil maturation. Acute Myeloid Leukemia (AML) = block in differentiation. Myelomonocy9c FAB M5

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data

Comprehensive Chromosome Screening Is NextGen Likely to be the Final Best Platform and What are its Advantages and Quirks?

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

Supplementary methods:

Review: Genome assembly Reads

Ambient temperature regulated flowering time

Chemical Biology, Option II Mechanism Based Proteomic Tagging Case History CH1

Towards Personalized Medicine: An Improved De Novo Assembly Procedure for Early Detection of Drug Resistant HIV Minor Quasispecies in Patient Samples

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

AZOOSPERMIA Chromosome Y

DMD Genetics: complicated, complex and critical to understand

Transcript reconstruction

Supplementary Figure 1. Using DNA barcode-labeled MHC multimers to generate TCR fingerprints

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Deep-Sequencing of HIV-1

Nature Structural & Molecular Biology: doi: /nsmb.2419

arxiv: v2 [q-bio.pe] 21 Jan 2008

Epigenetic drift in aging twins

CentoXome FUTURE'S KNOWLEDGE APPLIED TODAY

Human Inheritance. Use Target Reading Skills. Patterns of Human Inheritance. Modern Genetics Guided Reading and Study

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

CentoXome FUTURE'S KNOWLEDGE APPLIED TODAY

Benefits and pitfalls of new genetic tests

High coverage in planta RNA sequencing identifies Fusarium oxysporum effectors and Medicago truncatularesistancemechanisms

SUPPLEMENTARY INFORMATION

Sex Determination and Gonadal Sex Differentiation in Fish

An Overview of Cytogenetics. Bridget Herschap, M.D. 9/23/2013

Figure S1. Molecular confirmation of the precise insertion of the AsMCRkh2 cargo into the kh w locus.

Code number given on the right hand side of the question paper should be written on the title page of the answerbook by the candidate.

Statistical analysis of RIM data (retroviral insertional mutagenesis) Bioinformatics and Statistics The Netherlands Cancer Institute Amsterdam

CURRENT GENETIC TESTING TOOLS IN NEONATAL MEDICINE. Dr. Bahar Naghavi

Cancer Research: A Quest for a Cure

sirna count per 50 kb small RNAs matching the direct strand Repeat length (bp) per 50 kb repeats in the chromosome

Proteins: Proteomics & Protein-Protein Interactions Part I

Human Genetics Notes:

Lecture 3-4. Identification of Positive Regulators downstream of SA

MPS for translocations

Segment-specific and common nucleotide sequences in the

Role of FISH in Hematological Cancers

Transcriptome and isoform reconstruc1on with short reads. Tangled up in reads

NGS in tissue and liquid biopsy

Proposal form for the evaluation of a genetic test for NHS Service Gene Dossier

Unit 8.1: Human Chromosomes and Genes

Algorithms for studying the structure and function of genomes

Comprehensive Genome and Transcriptome Structural Analysis of a Breast Cancer Cell Line using PacBio Long Read Sequencing

Alpha thalassemia mental retardation X-linked. Acquired alpha-thalassemia myelodysplastic syndrome

QIAsymphony DSP Circulating DNA Kit

RNA-seq Introduction

New and Developing Technologies for Genetic Diagnostics National Genetics Reference Laboratory (Wessex) Salisbury, UK - July 2010 BACs on Beads

CHROMOSOMAL MICROARRAY (CGH+SNP)

Trends in swine nutrition science

Supplementary Information. Preferential associations between co-regulated genes reveal a. transcriptional interactome in erythroid cells

SALSA MLPA probemix P360-A1 Y-Chromosome Microdeletions Lot A

Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab

Determination of Genomic Imbalances by Genome-wide Screening Approaches

Protein SD Units (P-value) Cluster order

Cell Cycle and Mitosis

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Basket and Umbrella Trial Designs in Oncology

Figure S1. Generation of inducible PTEN deficient mice and the BMMCs (A) B6.129 Pten loxp/loxp mice were mated with B6.

R2: web-based genomics analysis and visualization platform

How to Standardise and Assemble Raw Data into Sequences: What Does it Mean for a Laboratory to Use Such Technologies?"

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Structural vs. nonstructural proteins

iplex genotyping IDH1 and IDH2 assays utilized the following primer sets (forward and reverse primers along with extension primers).

P450I I E 1 Gene : Dra I. Human Cytochrome Polymorphism and

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

Moore s law in information technology. exponential growth!

Red clover (Trifolium pratense L.) draft genome provides a. platform for trait improvement

Supplementary Figure 1

Table S1. Relative abundance of AGO1/4 proteins in different organs. Table S2. Summary of smrna datasets from various samples.

Inference of Isoforms from Short Sequence Reads

Anti-hTERT Antibody (SCD-A7)

ChIP-seq data analysis

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Transcription:

Genome mapping 5-10 Mb Cytogene(c Band Genome sequencing Next Gen sequencing STS mapping fingerprint mapping YACs ~1 Mb BACs ~150 Kb Human Genome Gene9c Map Genome mapping Sequence- ready BAC map Genome sequencing 1977-2003 Next Gen sequencing 1

F. Sanger, S. Nicklen, and A. R. Coulson, Proc Natl Acad Sci U S A. 1977; 74: 5463 5467 2

SCIENCE VOL. 274 25 OCTOBER 1996 SCIENCE VOL 287 24 MARCH 2000 11 DECEMBER 1998 VOL 282 SCIENCE 3

1/26/15 15 February 2001 NATURE VOL 421 6 FEBRUARY 2003 3,000 Mbp finished 3 GB dra_ sequence Detec(on of Fluorescently Tagged DNA 5 TTACGATGH! 5 TTACGATGCGH! 5 TTACGATGCGGH G T A C C C T G A T C A 5 TTACGATGCGGAATGH! 5 TTACGATGCGGAATGACGH! 5 TTACGAH! 5 TTACGATGCGGAH 5 TTACGATGCGGAAH! 5 TTACGATGCGGAATGAH! 5 TTACGATGCGGAATGACGAH! 5 TTACGATGCGGAATGACGAAH! 5 TTACGATH! 5 - - - - - ACTAGTCCCATGdd3 5 - - - - - ACTAGTCCCATdd3 5 - - - - - ACTAGTCCCAdd3 5 - - - - - ACTAGTCCCdd3 5 - - - - - ACTAGTCCdd3 5 - - - - - ACTAGTCdd3 5 - - - - - ACTAGTdd3 5 - - - - - ACTAGdd3 5 - - - - - ACTAdd3 5 - - - - - ACTdd3 5 - - - - - ACdd3 5 - - - - - Add3 5 TTACGATGCGGaaTH! 5 TTACGATGCGGAATGACGAATH! 5 TTACGATGCH! 5 TTACGATGCGGAATGACH! Op(cal Detec(on System 5 TTACGATGCGGAATGACGAATCH F. Sanger, S. Nicklen, and A. R. Coulson, Proc Natl Acad Sci U S A. 1977; 74: 5463 5467 Output to Computer Eric Green, NHGRI 4

Fluorescent DNA Sequencing Data Eric Green, NHGRI hdp://www3.appliedbiosystems.com quan9fying sequence accuracy hdp://www.phrap.com/phred/ hdp://www.cas.vanderbilt.edu/bsci111a/sequence- analysis/tab- a- complete- trace.gif Ewing B et al. et Green P Genome Res. 1998 8:175-85 PMID: 9521921 and 8:186-194 PMID: 9521922 5

>gnl ti 2 name:g10p69425rg9.t0! 10 15 9 7 7 7 4 4 4 4 9 4 0 4 0 4 4 6 6 6 6 7 7 7 6 6 4 6 6 4 0 4 6 4 4 4 6 4 0 4 6 6 4 4 0 4 6 8 12 12 8 6 4 0 4 8 6 6 6 8 8 7 7 7 9 15 15 25 28 28 33 33 33 34 34 36 36 33 30 30 26 18 18 9 7 7 12 18 18 24 24 23 23 21 21 25 26 26 26 26 26 24 33 34 24 24 24 26 26 25 23 23 20 20 20 20 33 33 40 40 26 26 26 26 30 26 38 38 38 45 45 30 33 30 30 23 23 26 26 26 26 28 45 45 45 45 45 45 41 41 41 45 45 45 37 37 40 37 37 37 37 37 37 45 45 49 49 49 49 42 34 34 34 34 34 34 42 42 42 42 42 37 37 37 40 45 23 25 21 28 28 30 45 49 45 42 40 42 42 42 42 42 42 42 42 42 33 33 33 35 35 35 42 42 42 42 40 33 33 25 22 18 23 21 23 23 42 45 51 51 42 40 42 37 37 41 51 51 51 51 51 51 39 42 30 30 30 33 33 35 40 42 42 39 39 39 39 39 39 39 51 41 43 41 40 40 33 28 28 28 29 28 33 35 35 33 33 39 41 41 45 45 45 45 49 42 42 45 45 40 42 45 45 45 49 51 51 51 51 45 45 42 42 42 37 45 30 30 30 45 45 51 45 45 45 41 41 51 45 39 32 30 30 30 30 34 45 45 45 40 40 40 42 42 42 51 51 45 45 45 41 41 39 51 51 49 49 45 45 22 22 22 36 36 39 42 42 42 42 42 42 51 51 51 51 51 51 51 51 51 51 51 49 42 35 35 35 35 35 35 45 40 40 40 42 42 42 49 45 45 51 51 45 45 49 49 45 45 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 49 49 45 45 39 39 51 51 51 51 45 41 41 41 45 45 45 45 45 51 49 49 45 45 45 45 41 41 45 51 51 51 51 51 51 51 37 33 33 33 33 33 37 45 45 45 43 41 41 40 37 33 33 33 33 33 33 40 40 37 37 37 45 41 45 45 49 49 49 45 49 49 49 45 45 41 41 41 41 45 45 49 49 49 45 45 45 45 42 38 37 37 36 hdp://www.cas.vanderbilt.edu/bsci111a/sequence- analysis/tab- a- complete- trace.gif 34 45 49 49 49 45 40 40 40 40 40 37 37 37 45 45 45 34 34 34 34 34! F. Sanger, S. Nicklen, and A. R. Coulson, Proc Natl Acad Sci U S A. 1977; 74: 5463 5467 It is a great source of joy to me that the dideoxy method is s9ll the basic technique used. It was perhaps the climax of my career and makes me feel that all our previous studies on sequences with their successes and failures were not only enjoyable but also a worthwhile contribu(on to the future of medicine. Fred Sanger 2001 Nature Med. 7:267-8 5-10 Mb shotgun clones ~2 Kb or ~10 Kb sequence fragment Cytogene(c Band YACs ~1 Mb BACs ~150 Kb TTCAGCTGGAATCGAATTCATCGGT! ATTCATCGGTGTCGATGCTGATTAACTAGCTAGTTTACCCAA! AGTTTACCCAATACCCAATTCGATCGACCGATTCGAC! assemble con(gs finishing finished sequence 6

shotgun clones ~2 Kb or ~10 Kb fragment sequence TTCAGCTGGAATCGAATTCATCGGT! ATTCATCGGTGTCGATGCTGATTAACTAGCTAGTTTACCCAA! AGTTTACCCAATACCCAATTCGATCGACCGATTCGAC! assemble con(gs finishing finished sequence Problems with the shotgun approach T.A. Brown GENOMES 2 BIOS Scien(fic Publishers Ltd, 2002 7

Problems with the shotgun approach whole-human genome shotgun assembly scaffold GATC GATC 10, 50 kb inserts T.A. Brown GENOMES 2 BIOS Scien(fic Publishers Ltd, 2002 Published by AAAS J. C. Venter et al., Science 291, 1304-1351 (2001) perfect 2X coverage 50% of the assembed sequence lies in con9gs of length N50 or greater random 2X coverage Expecta9on for 7X WGS 30 kb HGP 7X WGS mouse assembly ~24 kb Waterston RH, Lander ES, Sulston JE (2002) On the sequencing of the human genome. PNAS 99: 3712-16; PNAS 100: 3022-3 8

whole-human genome shotgun assembly chromosomes hybrid WGS and hierarchical sequencing N50 = ~3.6 MKb (2.3 Mb HGP) BACs 2, 10, 50 kb fragments 5.1X coverage N50 = ~86 Kb (82 Kb HGP) 2X shred of BACs Published by AAAS J. C. Venter et al., Science 291, 1304-1351 (2001) Green ED (2001) Strategies for the Sequencing of Complex Genomes. Nature Reviews Gene2cs 2: 573 PMID: 11483982 how many sequence reads do we need? P(k;λ) = (λ k e - λ ) k! k = # of events = # of (mes a given base is sequenced λ = mean # of events = average sequence coverage Example Average coverage (λ) = 5x Probability a given base is sequenced exactly 10 (k) 9mes is 5 10 e - 5 /10! = 0.018, or ~ 2% of bases will have exactly 10x coverage. If you sequence at 10x coverage how much of the genome will be sequenced at least 5 9mes? = 1 probability base is sequenced < 5 9mes = 1 [P(0,10) + P(1,10) + P(2,10) + P(3,10) + P(4,10)] = 0.97 P(k;λ) = (λ k e - λ ) k! Lander & Waterman GENOMICS 2, 231-239 (1988) 9

Rela9onship of sequence coverage and con9g length (Θ = frac9on of clone overlap needed) hdp://www.genome.ou.edu/landerwatermantables1_2_3.htm Large- scale genome sequence processing by M Kasahara & S Morishita whole-human genome shotgun assembly Genome mapping Genome sequencing Next Gen sequencing Published by AAAS J. C. Venter et al., Science 291, 1304-1351 (2001) 10