Genome mapping 5-10 Mb Cytogene(c Band Genome sequencing Next Gen sequencing STS mapping fingerprint mapping YACs ~1 Mb BACs ~150 Kb Human Genome Gene9c Map Genome mapping Sequence- ready BAC map Genome sequencing 1977-2003 Next Gen sequencing 1
F. Sanger, S. Nicklen, and A. R. Coulson, Proc Natl Acad Sci U S A. 1977; 74: 5463 5467 2
SCIENCE VOL. 274 25 OCTOBER 1996 SCIENCE VOL 287 24 MARCH 2000 11 DECEMBER 1998 VOL 282 SCIENCE 3
1/26/15 15 February 2001 NATURE VOL 421 6 FEBRUARY 2003 3,000 Mbp finished 3 GB dra_ sequence Detec(on of Fluorescently Tagged DNA 5 TTACGATGH! 5 TTACGATGCGH! 5 TTACGATGCGGH G T A C C C T G A T C A 5 TTACGATGCGGAATGH! 5 TTACGATGCGGAATGACGH! 5 TTACGAH! 5 TTACGATGCGGAH 5 TTACGATGCGGAAH! 5 TTACGATGCGGAATGAH! 5 TTACGATGCGGAATGACGAH! 5 TTACGATGCGGAATGACGAAH! 5 TTACGATH! 5 - - - - - ACTAGTCCCATGdd3 5 - - - - - ACTAGTCCCATdd3 5 - - - - - ACTAGTCCCAdd3 5 - - - - - ACTAGTCCCdd3 5 - - - - - ACTAGTCCdd3 5 - - - - - ACTAGTCdd3 5 - - - - - ACTAGTdd3 5 - - - - - ACTAGdd3 5 - - - - - ACTAdd3 5 - - - - - ACTdd3 5 - - - - - ACdd3 5 - - - - - Add3 5 TTACGATGCGGaaTH! 5 TTACGATGCGGAATGACGAATH! 5 TTACGATGCH! 5 TTACGATGCGGAATGACH! Op(cal Detec(on System 5 TTACGATGCGGAATGACGAATCH F. Sanger, S. Nicklen, and A. R. Coulson, Proc Natl Acad Sci U S A. 1977; 74: 5463 5467 Output to Computer Eric Green, NHGRI 4
Fluorescent DNA Sequencing Data Eric Green, NHGRI hdp://www3.appliedbiosystems.com quan9fying sequence accuracy hdp://www.phrap.com/phred/ hdp://www.cas.vanderbilt.edu/bsci111a/sequence- analysis/tab- a- complete- trace.gif Ewing B et al. et Green P Genome Res. 1998 8:175-85 PMID: 9521921 and 8:186-194 PMID: 9521922 5
>gnl ti 2 name:g10p69425rg9.t0! 10 15 9 7 7 7 4 4 4 4 9 4 0 4 0 4 4 6 6 6 6 7 7 7 6 6 4 6 6 4 0 4 6 4 4 4 6 4 0 4 6 6 4 4 0 4 6 8 12 12 8 6 4 0 4 8 6 6 6 8 8 7 7 7 9 15 15 25 28 28 33 33 33 34 34 36 36 33 30 30 26 18 18 9 7 7 12 18 18 24 24 23 23 21 21 25 26 26 26 26 26 24 33 34 24 24 24 26 26 25 23 23 20 20 20 20 33 33 40 40 26 26 26 26 30 26 38 38 38 45 45 30 33 30 30 23 23 26 26 26 26 28 45 45 45 45 45 45 41 41 41 45 45 45 37 37 40 37 37 37 37 37 37 45 45 49 49 49 49 42 34 34 34 34 34 34 42 42 42 42 42 37 37 37 40 45 23 25 21 28 28 30 45 49 45 42 40 42 42 42 42 42 42 42 42 42 33 33 33 35 35 35 42 42 42 42 40 33 33 25 22 18 23 21 23 23 42 45 51 51 42 40 42 37 37 41 51 51 51 51 51 51 39 42 30 30 30 33 33 35 40 42 42 39 39 39 39 39 39 39 51 41 43 41 40 40 33 28 28 28 29 28 33 35 35 33 33 39 41 41 45 45 45 45 49 42 42 45 45 40 42 45 45 45 49 51 51 51 51 45 45 42 42 42 37 45 30 30 30 45 45 51 45 45 45 41 41 51 45 39 32 30 30 30 30 34 45 45 45 40 40 40 42 42 42 51 51 45 45 45 41 41 39 51 51 49 49 45 45 22 22 22 36 36 39 42 42 42 42 42 42 51 51 51 51 51 51 51 51 51 51 51 49 42 35 35 35 35 35 35 45 40 40 40 42 42 42 49 45 45 51 51 45 45 49 49 45 45 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 49 49 45 45 39 39 51 51 51 51 45 41 41 41 45 45 45 45 45 51 49 49 45 45 45 45 41 41 45 51 51 51 51 51 51 51 37 33 33 33 33 33 37 45 45 45 43 41 41 40 37 33 33 33 33 33 33 40 40 37 37 37 45 41 45 45 49 49 49 45 49 49 49 45 45 41 41 41 41 45 45 49 49 49 45 45 45 45 42 38 37 37 36 hdp://www.cas.vanderbilt.edu/bsci111a/sequence- analysis/tab- a- complete- trace.gif 34 45 49 49 49 45 40 40 40 40 40 37 37 37 45 45 45 34 34 34 34 34! F. Sanger, S. Nicklen, and A. R. Coulson, Proc Natl Acad Sci U S A. 1977; 74: 5463 5467 It is a great source of joy to me that the dideoxy method is s9ll the basic technique used. It was perhaps the climax of my career and makes me feel that all our previous studies on sequences with their successes and failures were not only enjoyable but also a worthwhile contribu(on to the future of medicine. Fred Sanger 2001 Nature Med. 7:267-8 5-10 Mb shotgun clones ~2 Kb or ~10 Kb sequence fragment Cytogene(c Band YACs ~1 Mb BACs ~150 Kb TTCAGCTGGAATCGAATTCATCGGT! ATTCATCGGTGTCGATGCTGATTAACTAGCTAGTTTACCCAA! AGTTTACCCAATACCCAATTCGATCGACCGATTCGAC! assemble con(gs finishing finished sequence 6
shotgun clones ~2 Kb or ~10 Kb fragment sequence TTCAGCTGGAATCGAATTCATCGGT! ATTCATCGGTGTCGATGCTGATTAACTAGCTAGTTTACCCAA! AGTTTACCCAATACCCAATTCGATCGACCGATTCGAC! assemble con(gs finishing finished sequence Problems with the shotgun approach T.A. Brown GENOMES 2 BIOS Scien(fic Publishers Ltd, 2002 7
Problems with the shotgun approach whole-human genome shotgun assembly scaffold GATC GATC 10, 50 kb inserts T.A. Brown GENOMES 2 BIOS Scien(fic Publishers Ltd, 2002 Published by AAAS J. C. Venter et al., Science 291, 1304-1351 (2001) perfect 2X coverage 50% of the assembed sequence lies in con9gs of length N50 or greater random 2X coverage Expecta9on for 7X WGS 30 kb HGP 7X WGS mouse assembly ~24 kb Waterston RH, Lander ES, Sulston JE (2002) On the sequencing of the human genome. PNAS 99: 3712-16; PNAS 100: 3022-3 8
whole-human genome shotgun assembly chromosomes hybrid WGS and hierarchical sequencing N50 = ~3.6 MKb (2.3 Mb HGP) BACs 2, 10, 50 kb fragments 5.1X coverage N50 = ~86 Kb (82 Kb HGP) 2X shred of BACs Published by AAAS J. C. Venter et al., Science 291, 1304-1351 (2001) Green ED (2001) Strategies for the Sequencing of Complex Genomes. Nature Reviews Gene2cs 2: 573 PMID: 11483982 how many sequence reads do we need? P(k;λ) = (λ k e - λ ) k! k = # of events = # of (mes a given base is sequenced λ = mean # of events = average sequence coverage Example Average coverage (λ) = 5x Probability a given base is sequenced exactly 10 (k) 9mes is 5 10 e - 5 /10! = 0.018, or ~ 2% of bases will have exactly 10x coverage. If you sequence at 10x coverage how much of the genome will be sequenced at least 5 9mes? = 1 probability base is sequenced < 5 9mes = 1 [P(0,10) + P(1,10) + P(2,10) + P(3,10) + P(4,10)] = 0.97 P(k;λ) = (λ k e - λ ) k! Lander & Waterman GENOMICS 2, 231-239 (1988) 9
Rela9onship of sequence coverage and con9g length (Θ = frac9on of clone overlap needed) hdp://www.genome.ou.edu/landerwatermantables1_2_3.htm Large- scale genome sequence processing by M Kasahara & S Morishita whole-human genome shotgun assembly Genome mapping Genome sequencing Next Gen sequencing Published by AAAS J. C. Venter et al., Science 291, 1304-1351 (2001) 10