Characterisation of structural variation in breast cancer genomes using paired-end sequencing on the Illumina Genome Analyser Phil Stephens Cancer Genome Project
Why is it important to study cancer?
Why is it important to study cancer? Causes of cancer How cancers develop Pathways involved
Why is it important to study cancer? Causes of cancer How cancers develop Pathways involved Development of tests for early detection
Why is it important to study cancer? Causes of cancer How cancers develop Pathways involved Development of tests for early detection New targets for anticancer drug development
Why is it important to study cancer? Causes of cancer How cancers develop Pathways involved Development of tests for early detection New targets for anticancer drug development Monitor whether treatment is working
Why is it important to study cancer? Causes of cancer How cancers develop Pathways involved Development of tests for early detection New targets for anticancer drug development Monitor whether treatment is working Personalised treatment of cancer
Why is it important to study cancer? Causes of cancer How cancers develop Pathways involved Development of tests for early detection New targets for anticancer drug development Monitor whether treatment is working Personalised treatment of cancer
Why is it important to study cancer? Causes of cancer How cancers develop Pathways involved Development of tests for early detection New targets for anticancer drug development Monitor whether treatment is working Personalised treatment of cancer
Challenges of studying solid tumours
Breast cancer spectral karyotype C/O Paul Edwards, Departments of Pathology and Oncology, University of Cambridge 73 Chromosomes, 37 structural abnormalities
Amplification MYC in Breast Cancer 20 16 MYC 12 8 4 60 80 100 120 140 Genomic location (Mb)
Point mutations & Indels Pre-cancerous lesion in situ cancer Invasive cancer Metastatic cancer
Point mutations & Indels Pre-cancerous lesion in situ cancer Invasive cancer Metastatic cancer ~20 Cancer causing mutations
Point mutations & Indels Pre-cancerous lesion in situ cancer Invasive cancer Metastatic cancer ~20 Cancer causing mutations 100,000 Passenger mutations
Pessimism around studying solid tumours
BRAF is an oncogene 70% of Melanoma T1799A V600E
BRAF is an oncogene 70% of Melanoma T1799A V600E BRAF is not important Melanoma
BRAF is an oncogene 70% of Melanoma T1799A V600E BRAF is not important Melanoma BRAF is not a good drug target
BRAF is an oncogene 70% of Melanoma T1799A V600E BRAF is not important Melanoma BRAF is not a good drug target Melanoma is too complex to treat effectively
Patient with malignant melanoma Courtesy of Dr Grant McArthur
Selective inhibitor of BRAF V600E (Plexxicon 4032) Courtesy of Dr Grant McArthur
Selective inhibitor of BRAF V600E (Plexxicon 4032) Before 15 days after Courtesy of Dr Grant McArthur
New sequencing technologies
Rearrangement characterisation
Summary of Illumina GA protocol 500bps Randomly shear DNA Size select Sequence from & ligate Illumina adaptors & amplify both ends
Summary of Illumina GA protocol 400 bps 37 bps 37 bps
Summary of Illumina GA protocol 400 bps 37 bps 37 bps Align reads to ref sequence MAQ algorithm Li Heng, http://maq.sourceforge.net/. Genome Res., Nov 2008
Correctly mapping paired end reads 400 bps 37 bps 37 bps Chromosome 11
Incorrectly mapping paired-end reads 400 bps 37 bps 37 bps Chromosome 11 Chromosome 8
PCR amplify in tumour and matched normal Somatic Germline Artefact T N T N T N
PCR amplify in tumour and matched normal Somatic Germline Artefact T N T N T N
PCR amplify in tumour and matched normal Somatic Germline Artefact T N T N T N 6 bps of microhomology at break
Structural variation screen 9 matched breast cancer cell lines 49.0 Gb total data 8.5 mean physical coverage
Illumina GA sequencing:- Broad objectives Structural variation Fusion genes Copy number changes Structure of amplicons
Breast cancer HCC38 spectral karyotype C/O Paul Edwards, Departments of Pathology and Oncology, University of Cambridge 73 Chromosomes, 37 structural abnormalities
Breast cancer HCC38 (Triple neg) 238 somatic structural variants
Breast cancer HCC38 (Triple neg) 150 100 50 238 somatic structural variants Patterns of variation 0
What are these structural variants doing?
Copy number ~160 kb copy number increase 6 5 4 3 2 1 0 48 48.2 48.4 48.6 48.8 49 49.2 49.4 Chromosome 3 position (Mb)
Copy number 164 kb tandem duplication 6 5 4 3 2 T N 1 0 48 48.2 48.4 48.6 48.8 49 49.2 49.4 Chromosome 3 position (Mb)
Copy number 164 kb tandem duplication 6 5 4 3 2 1 0 48 48.2 48.4 48.6 48.8 49 49.2 49.4 Chromosome 3 position (Mb) SLC26A6/PRKAR2A in frame fusion gene SLC26A6 PRKAR2A (Exons 1-17) (Exons 4-11) 5 UTR 3 UTR
Copy number 164 kb tandem duplication Genomic PCR bps 6 5 600 400 T N 4 3 200 T N 2 1 0 48 48.2 48.4 48.6 48.8 49 49.2 49.4 Chromosome 3 position (Mb) SLC26A6/PRKAR2A in frame fusion gene SLC26A6 PRKAR2A (Exons 1-17) (Exons 4-11) 5 UTR 3 UTR
Copy number 164 kb tandem duplication Genomic PCR bps 6 5 600 400 T N 4 3 200 T N 2 1 0 48 48.2 48.4 48.6 48.8 49 49.2 49.4 Chromosome 3 position (Mb) SLC26A6/PRKAR2A in frame fusion gene FISH confirmation of tandem duplication SLC26A6 PRKAR2A (Exons 1-17) (Exons 4-11) 5 UTR 3 UTR
bps 600 400 200 T N RT-PCR
bps 600 400 200 T N RT-PCR
bps 600 400 200 T N RT-PCR
bps 600 400 200 T N RT-PCR Predicted 914 amino acid fusion protein MGLADASGPRDTQALLSATQAMDLRRRDYHMERPLLNQEHLEELGRWGSAPRTHQWRTWLQCSRARAYALLLQHLPVLVWLPRYPVRDWLLGDLLSGL SVAIMQLPQGLAYALLAGLPPVFGLYSSFYPVFIYFLFGTSRHISVGTFAVMSVMVGSVTESLAPQALNDSMINETARDAARVQVASTLSVLVGLFQVGLGLIH FGFVVTYLSEPLVRGYTTAAAVQVFVSQLKYVFGLHLSSHSGPLSLIYTVLEVCWKLPQSKVGTVVTAAVAGVVLVVVKLLNDKLQQQLPMPIPGELLTLIGAT GISYGMGLKHRFEVDVVGNIPAGLVPPVAPNTQLFSKLVGSAFTIAVVGFAIAISLGKIFALRHGYRVDSNQELVALGLSNLIGGIFQCFPVSCSMSRSLVQEST GGNSQVAGAISSLFILLIIVKLGELFHDLPKAVLAAIIIVNLKGMLRQLSDMRSLWKANRADLLIWLVTFTATILLNLDLGLVVAVIFSLLLVVVRTQMPHYSVLGQ VPDTDIYRDVAEYSEAKEVRGVKVFRSSATVYFANAEFYSDALKQRCGVDVDFLISQKKKLLKKQEQLKLKQLQKEEKLRKQAASPKGASVSINVNTSLEDMR SNNVEDCKMVIHPKTDEQRCRLQEACKDILLFKNLDQEQLSQVLDAMFERIVKADEHVIDQGDDGDNFYVIERGTYDILVTKDNQTRSVGQYDNRGSFGEL ALMYNTPRAATIVATSEGSLWGLDRVTFRRIIVKNNAKKRKMFESFIESVPLLKSLEVSERMKIVDVIGEKIYKDGERIITQGEKADSFYIIESGEVSILIRSRTKSN KDGGNQEVEIARCHKGQYFGELALVTNKPRAASAYAVGDVKCLVMDVQAFERLLGPCMDIMKRNISHYEEQLVKMFGSSVDLGNLGQStop
Five expressed in frame fusion genes 2 generated by tandem duplications 3 generated by large inversions
Very small rearrangements:- below the resolution of copy number graphs
Very small rearrangements:- below the resolution of copy number graphs T N 5 kb tandem duplication, spanned by 4 reads 12 13 14 15 14 15 16 17 18 19 TNRC15:- In frame duplication exons 14 & 15
Different patterns of structural variation are emerging from other breast cancers
Patterns of somatic structural variation HCC38 HCC1937 HCC1954 Triple Neg BRCA1 null ERBB2 Positive
Patterns of somatic structural variation HCC38 HCC1937 HCC1954 Triple Neg BRCA1 null ERBB2 Positive
Patterns of somatic structural variation HCC38 HCC1937 HCC1954 Triple Neg BRCA1 null ERBB2 Positive
Copy number Complex patterns of structural variation 10 8 6 4 2 0 0 25 50 75 100 125 150 Solexa copy number:- chromosome 6
Copy number Complex patterns of structural variation 10 8 6 Tandem Duplication 4 2 0 0 25 50 75 100 125 150 Solexa copy number:- chromosome 6
Copy number Complex patterns of structural variation 10 8 6 Tandem Duplication Deletion & Tandem Duplication 4 2 0 0 25 50 75 100 125 150 Solexa copy number:- chromosome 6
Copy number Complex patterns of structural variation t(6;7) translocation 10 8 Tandem Duplication t(6;7) translocation Deletion & Tandem Duplication 6 4 2 0 0 25 50 75 100 125 150 Solexa copy number:- chromosome 6
Copy number Complex patterns of structural variation 10 8 t(6;7) translocation Tandem Duplication Inversion & Tandem Duplication & t(6;7) translocation t(6;7) translocation Deletion & Tandem Duplication 6 4 2 0 0 25 50 75 100 125 150 Solexa copy number:- chromosome 6
Copy number Complex patterns of structural variation 10 8 6 t(6;7) translocation Tandem Duplication Inversion & Tandem Duplication & t(6;7) translocation Inversion & Tandem Duplication & t(6;7) translocation t(6;7) translocation Deletion & Tandem Duplication 4 2 0 0 25 50 75 100 125 150 Solexa copy number:- chromosome 6
Copy number Complex patterns of structural variation 10 8 6 t(6;7) translocation Tandem Duplication Complex Complex Inversion & Tandem Duplication & t(6;7) translocation Inversion & Tandem Duplication & t(6;7) translocation t(6;7) translocation Deletion & Tandem Duplication 4 2 0 0 25 50 75 100 125 150 Solexa copy number:- chromosome 6
Summary of breast cancer cell line data 9 breast cancer cell lines 1152 somatic rearrangements 16 In-frame fusion genes (15/16 expressed)
Structural variation screen on 15 primary breast cancers 2 Luminal A 2 Luminal B 3 Basal like 2 BRCA1 neg 2 BRCA2 neg 3 ERBB2 + 1 Lobular 66.6 Gb total data 7.4 mean physical coverage
199 somatic structural variants Triple negative
Triple negative 150 100 50 0 199 somatic structural variants Patterns of variation
Triple negative 4 2 0 7 somatic structural variants Patterns of variation
ERRB2 positive 50 25 0 62 somatic structural variants Patterns of variation
ERRB2 positive 50 40 30 20 10 0 94 somatic structural variants Patterns of variation
Luminal B expression 200 150 100 50 0 231 somatic structural variants Patterns of variation
Invasive lobular carcinoma 2 1 0 1 somatic structural variant Patterns of variation
Novel expressed in frame fusion gene
Novel expressed in frame fusion gene 16.8 Mb inversion
Novel expressed in frame fusion gene 16.8 Mb inversion
Novel expressed in frame fusion gene 16.8 Mb inversion
Novel expressed in frame fusion gene 16.8 Mb inversion
Novel expressed in frame fusion gene 16.8 Mb inversion
Novel expressed in frame fusion gene 16.8 Mb inversion
Novel expressed in frame fusion gene 16.8 Mb inversion
Very small rearrangements T N 3 kb deletion RB1:- Deletion Exon 13
Summary of primary breast cancer data 15 primary breast cancers 1014 Somatic rearrangements 10 In-frame fusion genes (5 expressed)
How do we identify the driver mutations?
How do we identify the driver mutations? 139 genes rearranged in more than one sample
How do we identify the driver mutations? 139 genes rearranged in more than one sample 22 gene fusions ~300 primary breast cancers FISH cdna PCR (Jorge Reis-Filho) (John Foekins)
Whole exome sequencing (Agilent Sure Select)
Whole exome sequencing (Agilent Sure Select) Shear genomic DNA, ligate Illumina adaptors Hybridise to baits that span exons & wash Sequence form both ends
Breast cancer samples (28 total) 21 x ER + HER2 6 x ER + HER2 + 3 x triple neg
Substitutions in known cancer genes PD3995a AKT1 E17K PD3995a NF1 G-1T PD3994a PIK3CA N345K PD3989a PIK3CA E545K PD3856a PIK3CA H1047R PD3857a PIK3CA H1047R PD3888a PIK3CA H1047R PD3983a PIK3CA H1047R PD3985a PIK3CA H1047R PD3992a PIK3CA H1047R PD3996a PTEN Y27D PD3991a TP53 G245S PD4002a TP53 H179Y PD3987a TP53 Y220C PD3986a TP53 G+1A PD3985a TP53 R306X
High confidence Indels Sample Gene Mutation PD3849a CDH1 V193X PD3984a MAP2K4 V151X PD3992a MAP2K4 I81X PD3989a PTEN L370X
High confidence Indels Sample Gene Mutation PD3849a CDH1 V193X PD3984a MAP2K4 V151X PD3992a MAP2K4 I81X PD3989a PTEN L370X PD3995a GATA3 N352X PD3988a GATA3 N352X PD4004a GATA3 Read through
Novel cancer genes
Novel cancer genes Integrating rearrangement and exome data 2 novel pathways deregulated in a large proportion of ER pos breast cancer
Potential applications in healthcare
Personalised Haematology Risk stratification Therapy duration Prediction of relapse Tumour cells Time (months)
Tumour-specific rearrangements
Plasma DNA Normal tissues Cancer DNA with tumour-specific mutation
Work flow Identify tumour-specific rearrangements from cancer DNA 2 weeks ~ 2,500
Work flow Identify tumour-specific rearrangements from cancer DNA 2 weeks ~ 2,500 Design & test quantitative assay for rearrangements 1 week 250
Work flow Identify tumour-specific rearrangements from cancer DNA 2 weeks ~ 2,500 Design & test quantitative assay for rearrangements Extract DNA from plasma (10-20mL blood sample) 1 week 250 ½ day 50
Work flow Identify tumour-specific rearrangements from cancer DNA 2 weeks ~ 2,500 Design & test quantitative assay for rearrangements Extract DNA from plasma (10-20mL blood sample) 1 week 250 ½ day 50 Quantify tumour DNA ½ day 50
Assay design
Detecting 1 copy of tumour genome
Serial measurements
Potential healthcare applications Monitoring tumour response to therapy in real-time Reduce toxicity, prevent drug wastage
Potential healthcare applications Monitoring tumour response to therapy in real-time Reduce toxicity, prevent drug wastage Identifying disease relapse before clinically evident Pre-emptive therapy
Potential healthcare applications Monitoring tumour response to therapy in real-time Reduce toxicity, prevent drug wastage Identifying disease relapse before clinically evident Pre-emptive therapy Choosing intensity of adjuvant therapy based on risk stratification
Potential healthcare applications Monitoring tumour response to therapy in real-time Reduce toxicity, prevent drug wastage Identifying disease relapse before clinically evident Pre-emptive therapy Choosing intensity of adjuvant therapy based on risk stratification Surrogate marker of cell kill in early phase clinical trials
Potential healthcare applications 100 Breast cancers 100 Colorectal cancers 100 Osteosarcomas
Conclusions Paired-end sequencing is an effective tool for characterising structural variation in complex cancer genomes
Conclusions Paired-end sequencing is an effective tool for characterising structural variation in complex cancer genomes The average breast cancer has ~ 100 somatic structural variants
Conclusions Paired-end sequencing is an effective tool for characterising structural variation in complex cancer genomes The average breast cancer has ~ 100 somatic structural variants The average breast cancer has ~1.0 expressed in-frame fusion gene
Conclusions Paired-end sequencing is an effective tool for characterising structural variation in complex cancer genomes The average breast cancer has ~ 100 somatic structural variants The average breast cancer has ~1.0 expressed in-frame fusion gene Exome sequencing will unveil a multitude of novel drug targets in the next few years
Conclusions Paired-end sequencing is an effective tool for characterising structural variation in complex cancer genomes The average breast cancer has ~ 100 somatic structural variants The average breast cancer has ~1.0 expressed in-frame fusion gene Exome sequencing will unveil a multitude of novel drug targets in the next few years That personalised cancer care is on the horizon
Conclusions Paired-end sequencing is an effective tool for characterising structural variation in complex cancer genomes The average breast cancer has ~ 100 somatic structural variants The average breast cancer has ~1.0 expressed in-frame fusion gene Exome sequencing will unveil a multitude of novel drug targets in the next few years That personalised cancer care is on the horizon, and if you are wealthy enough, already here!