The 100,000 Genomes Project Harnessing the power of genomics for NHS rare disease and cancer patients Dr Richard Scott, Clinical Lead for Rare Disease Dr Nirupa Murugaesu, Clinical Lead for Cancer
Four main aims 1. To bring benefit to NHS patients 2. To create an ethical and transparent programme based on consent 3. To enable new scientific discovery and medical insights 4. To kickstart the development of a UK genomics industry 2
The 100,000 Genomes Project 3
Genomics England The Big Data Potential 13 NHS Genomic Medicine Centres Rare diseases, cancers and pathogens Broad consent, characteristics, molecular pathology and samples NIHR Biosample Centre DNA & multi-omics Repository Sequencing Centre Wellcome Trust 27m Illumina Partnership Refreshable identifiable Clinical Data Life-course registry Linked to anonymised Whole Genome Sequence NHS Digital Primary Care Hospital episodes Cancer Registries Rare Registries Infectious Disease Mortality data Patient entry Fire wall MRC 24m Research Data Infrastructure Sequential builds of de-identified data and WGS Safe haven- Reading library Annotation & QC SME Product comparison Only processed results pass outside Clinicians & Academics 25m in Training 700 person years Industry GENE Consortium 4
Illumina Partnership 31/07/14 Sequencing Centre January 2016 Sequencing centre 31/7/2014 NHS 11 NHS GMCs Genomic Medicine Centre December Awarded 2014 20 th December 2014 Data Centre November 2014 5 5
Genomics England Clinical Interpretation Partnership 2,500 prospective GeCIP domain members 300 institutions, 24 countries Institution Count UK Academic 1744 NHS Trust 634 International Academic 198 Other 333 6
GeCIP Research Domains Rare Cancer Functional Cardiovascular Endocrine and Metabolism Gastroenterology and Hepatology Hearing and Sight Immunology and Haematology Inherited Cancer Predisposition Musculoskeletal Neurological Paediatric Sepsis Paediatrics Renal Respiratory Skin Adult Glioma Bladder Breast Colorectal & upper GI Lung Melanoma Renal Cell Sarcoma Testis Ovarian Prostate Childhood Solid Cancers Haematological Malignancy Pan Cancer (Ca of) Unknown primary Total number of researchers - 2250 Electronic Health Records Validation and Feedback Ethics and Social Science Functional Effects Health Economics Machine Learning, Quantitative Methods and Functional Genomics Population Genomics Enabling Rare Disease Translational Genomics via Advanced Analytics and International Interoperability Functional Cross Cutting Education and Training Stratified Medicine & Pharmacogenomics
100,000 Genomes Data types 22,000 people 1.4 million episodes 96% Life course data secondary sources GP Medication Hospital episodes Cancer registries: Maternal and Child registries Mental health and Intellectual diability Lab tests Diagnostic imaging Screening services ONS death details Genomic Results Interpretation Clinical application Somatic Germline GMC clinical baseline Sample metadata Ca: Diagnosis & staging RD: Phenotypes Lab test results Histopathology Treatments Radiology text Risk factors Sample images GMC registration Demographics Consent status Additional findings Registration RD: Pedigree 8 8
Rare Inherited diseases 5000 people in the pilot 17,700 in the main programme ~5% of the population 7000+ rare diseases 209 recruitment categories Detailed standardised eligibility & characterisation 40% less data Prior testing as in usual care Clinical data set: Pedigree Human Phenotyping Ontology Clinical test data from hospital systems Secondary data 23/02/2017 9 9
Recruitment by specific disease Top recruiting diseases 25 with >100 genomes 10 10
Recruitment by disease group Low-recruiting groups highlighted including: - dysmorphic; endocrine; haematological/immunological; metabolic and respiratory disorders Families 1600 1400 1448 1289 1200 1000 800 774 830 600 400 200 0 71 142 107 224 2 30 131 135 222 475 46 13 105 382 308 11 11
What are we telling participants? Information about a patient s main condition Information about additional serious and actionable conditions (optional) Carrier status for non affected parents of children with rare disease (optional) Image courtesy of Health Education England 12
Scalable rare disease diagnostics DNA Patient/ family Phenotypes & Pedigree Clinical assessment Validation Outcomes GeCIP(s) Gene Panels Report QC Reporting tool Genome sequence Annotated VCFs Gene Panel Variant filtering Tiered variants Review Annotation Companies
Empowering recruiting clinicians to direct the analysis 1. Presented with a summary of data entered 2. Option to control the genome analysis Select gene panels Specify whether non-penetrance should be permitted Control analysis in families with multiple monogenic disorders segregating separately 14 14
Interpretation at the hospitals 1. Automated first line analysis set up by clinician Mirrors standard diagnostic analysis Tier 1 or 2' variants in 0-5 variants 2. Interpretation software allows lab-clinical team to carry out further, bespoke analysis if necessary. Tier 3 standard GeL pipeline but not restricted to gene panels; 25-100 variants depending on family structure 3. Lab-clinical team curate variants and record outcomes Central record of variant interpretation for all labs to see building a National Genomics Knowledgebase Record clinical impacts of diagnostic result 15 15
Jessica Epileptic encephalopathy type 9 (GLUT1) Difficult to treat seizures Developmental delay Standard tests found no cause Now 4 years old De novo truncaiton in GLUT1 found Ends 4 year diagnostic odyssey Provides possible tailored therapy (ketogenic diet) Informs parents on risk of recurrence in another child (very low)
An infant with immune deficiency Transcobalamin 2 deficiency (TCN2) Recruited as a trio with consanguineous parents. Failure to thrive. Low blood counts. Disseminated viral infection. Unclassified immune deficiency. Died 5 months age Mother pregnant Transcobalamin 2 deficiency identified as the cause Treatable with B12 supplementation New child also affected and treated with B12 doing well
Cancer Programme Dr Nirupa Murugaesu, Clinical Lead for Cancer
Cancer: disease of two genomes The patient (germline/predisposing) The tumour (somatic) 19 19
Diagnosis Cancer - Molecular Lesions Prognosis Disease Monitoring
Current Practice in Cancer Single gene test: - companion diagnostics, many different technologies 21 21
The promise of Precision Medicine in NHS Cancer Care Diagnosis and management defined by topography and histology defined by molecular profile of tumour Molecular characterisation at multiple stages in tumour pathway molecular markers for: Diagnosis Prognosis Predict response to drugs Monitoring Development of novel targeted molecules, intelligent repurposing of drugs, cancer vaccines against novel neoantigens 22
Cancer Gene Panel Ion Semiconductor Sequencing:- based on the detection of hydrogen ions that are released during the polymerization of DNA 50 Gene cancer hot spot panel
Clinical Impact? - More uniform molecular diagnostics & profiling of cancer
Preliminary Analysis https://www.genomicsengland.co.uk/information-for-gmc-staff/cancerprogramme/genome-analysis/ 25
Preliminary Analysis Domain 1 Variants Variants in a virtual panel of potentially actionable genes*. Actionable genes are defined as genes in which small variants (SNVs and indels <50bp) have reported therapeutic, prognostic or clinical trial associations**, as defined by the GenomOncology Knowledge Management System. *Current potentially actionable genes for solid tumours: 77 genes, listed at Actionable genes in solid tumour v1.1 **Links are provided to clinical trials within the United Kingdom which are both actively recruiting participants or closed to recruitment. Domain 2 variants Variants in a virtual panel of cancer-related genes***. Cancer-related genes are defined as genes in which any variants have been causally implicated in cancer, as defined by the Cancer Gene Census (Wellcome Trust Sanger Institute) ***Current cancer-related genes: 590 genes, listed at Cancer census genes v1.1 document Domain 3 variants Small variants in genes not included in domains 1 & 2. 26
Domain 1 variants 27
Link to Knowledge Database 28
Link to Knowledge Database 29
Link to ClinicalTrials.gov 30
Supplementary Analysis Circos plot: genome-wide visualisation of somatic variants and sequencing depth Chromosomes are arranged sequentially around the circumference as indicated. The information presented in each track is as follows: Track 1 (innermost track): chromosomes Track 2 (in red): number of somatic SNVs in 2Mb window; scale from 0 to 100 Track 3 (in green): number of somatic indels in 2Mb window; scale from 0 to 35 Track 4: ratio of normalised depth of coverage for tumour vs normal in log2 scale smoothed over 100 kb windows. Diploid regions have value of 0. Scale is between -2 and 2. Regions with coverage below 15x in germline are not shown. CNV losses are indicated in red, CNV gains are indicated in green, copy-neutral LOH regions are indicated in yellow. Track 5 (outermost track, in blue): absolute depth of coverage in tumour sample Structural variants (SVs) are indicated by arcs inside the plot; translocations are indicated in green, inversions are indicated in purple. SVs shorter than 100 kb and insertions are not plotted. 32
Comparison with TCGA data
Mutational signature analysis Six classes of base substitution C>A, C>G, C>T, T>A, T>C, T>G 5 and 3 to each mutated base 96 possible mutations in this classification 34
Defective dsdna repair smoking Defective MMR repair UV light 35
Patient Sample Transformation of Molecular Pathology Sequencing Interpretation Clinical Report Clinical Validation Data Research Life course data secondary sources HES: APC, OP, A&E, CCMDS Cancer registry: COSD, SACT, RTDS, GP medication, DID NCASP Lab tests / DID Screening services GP / medication ONS death details Genomic Results Interpretation Clinical application Somatic Germline GMC clinical baseline Sample metadata Ca: Diagnosis & staging RD: Phenotypes Lab test results Histopathology Treatments Radiology text Risk factors Sample images GMC registration Demographics Consent status Additional findings Registration RD: Pedigree Data Archive 100,000 Genomes - data types 36 36
Acknowledgements Genomics England Cancer Team Bioinformatics Team Informatics Team Clinical Data Team GeCIPs NHS England NHS Genomic Medicine Centres Health Education England Cancer Working Group Validation & Feedback Working Group Illumina/R&D Team Academics (GeCIP) Commercial Partnerships (GENE) Genomic Medicine Centres NHSE Genomics England Illumina Health Education England 37
Questions?