Identifying Mutations Responsible for Rare Disorders Using New Technologies Jacek Majewski, Department of Human Genetics, McGill University, Montreal, QC Canada
Mendelian Diseases Clear mode of inheritance dominant of recessive High penetrance having the mutation determines the phenotype with near certainty Clear phenotypic consequences Low environmental influence lack of phenocopies Examples Tay-Sachs disease, Cystic Fibrosis
Finding causes of Mendelian Disorders Although many of those disorders are rare (1:2000 1:100,000 incidence), taken together they constitute a substantial health burden It is estimated that over 90% of Mendelian disorders are caused by mutations in the coding regions (missense, nonsense, frameshifts, in/dels, splicing) Traditionally, the approach was via genetic linkage analysis, followed by targeted sequencing of candidate genes. In view of new technologies can we do better?
1993
Homozygosity mapping Today The American Journal of Human Genetics(2010)doi:10.1016/j.ajhg.2010.09.005 Conventional mutation mapping strategy for autosomal recessive disorders. Here we collected four patients from consanguinous Bedouin families in Qatar with the Van den Ende-Gupta Syndrome (VDEGS). Genotyping using SNP arrays identified a region of shared homozygosity presumably inherited from a common ancestor and containing the mutation - 2.4 mb in length, containing 44 genes.
We concurrently conducted conventional Sanger sequencing of candidate genes and whole-exome sequencing. In this case, the targeted candidate approach was the winner, identifying two distinct pathogenic mutations in the SCARF2 gene. A few days later, results from exome sequencing confirmed this result and helped to exclude all other candidate genes in the region. A. Presence and effect of c.1328_1329deltg mutation. B. Presence and effect of c.773g>a mutation. C. Conservation of residues near 773G>A (p.c258y) mutation. D. Exome capture and sequencing results inthe mother of patient 1. Sanger Sequencing and Exome Capture
High Throughput Sequencing E.g. Illumina HiSeq, or ABI Solid Main characteristics typically produce millions to hundreds of millions short (50-100bp) sequencing reads per application Reads can be aligned to the reference genome, and variants identified Currently, sequencing of the entire human genome is still quite expensive - $9k today. Can we sequence a subset of the genome?
What is Exome Capture? A method that uses specific probes to capture only the coding portions of all annotated exons within the genome. This is followed by high throughput shotgun sequencing. Our group uses the Agilent SureSelect Human All Exon Kit (in solution beads), followed by Illumina GAIIx, 76 bp read sequencing (single lane).
Coverage Statistics Exome coverage obtained by Agilent All Exon in solution capture process, followed by Illumina sequencing. The bars represent coverage from 1, 2, and 3 lanes of 76 base reads. E.g., two lanes of sequence produce an average 52.3X coverage of the exome, 90% of the exome is covered at >10X level, and 95% at > 5X. In our experience, 2 lanes of sequencing provide excellent coverage at a reasonable cost. Further, slight improvement in coverage upon increasing the number of lanes is not warranted. Current bottom line should be able to get high quality 50X exome for around $2k
Current Analysis Pipeline Exome Capture and Sequencing by Genome Quebec Platforms Base calling, QC standard Illumina Pipeline Alignment to reference genome BWA Retain unique alignments only SNV and in/del calls (SAMTools, GATK) CNV calls (in house) Annotate functional variants as (ANNOVAR) Filtering Other exomes (~50) Key!!! dbsnp 1000 Genomes Relax filtering criteria if necessary
END USER OUTPUT
Filtering The Data
Filtering The Data
Visualization, manual QC and inspection
Visualization, manual QC and inspection
ONGOING Projects Part of the FORGE consortium, Canada-wide (GC/CIHR) Part of the IGNITE project, Dalhousie-Atlantic Canada (GC) RaDiCAL (McGill/MUHC) Numerous collaborations with individual researchers McGill, Quebec, Canada, France, Poland, Qatar, Lebanon
Successes so far - new genes Fowler Syndrome Mitochondrial Disease (2) Vitamin B12 metabolism (2) Nephrotic Syndrome LCA Hajdu-Cheney Syndrome Hyper IgM (2) Novel developmental dysmorphisms (2)
Two unrelated patients total study time = 2 weeks from DNA sample to gene
Schematic of the FLVCR2 mutations found in two patients (F1 and F3). A. The four mutations in FLVCR2 identified in F1 and F3 using whole exome sequencing as visualized using the Integrative Genomics Viewer from the Broad Institute (http://www.broadinstitute.org/igv). B Mutations in FLVCR2 are shown relative to the protein domains.
Mol Cell Biol. 2010 Sep 7. [Epub ahead of print] The Fowler Syndrome associated protein FLVCR2 is an importer of heme. Duffy SP, Shing J, Saraon P, Berger LC, Eiden MV, Wilde A, Tailor CS. Program in Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5G, Canada; National Institute of Mental Health, Laboratory of Cellular & Molecular Regulation, Bethesda, MD 20892 USA.
Hajdu-Cheney Syndrome (with Mark Samuels and Jeremy Schwartzentruber) Rare dominant disease, characterized by bone deterioration (our first dominant, FORGE project) < 100 patients known worldwide DNA from 3 affected family members, + 3 unrelated individuals Exome sequencing Identification of variants 2 days after sequence data available gene found All patients have truncating mutations in the last exon of the gene Notch2 Paper prepared for publication during the next week But...
Monday...
Summary Exome Sequencing is a fast and efficient way to identify new disease genes In our hands, success rates vary from 100% (families) to 33% (single individuals) Other applications: - genomic molecular diagnosis - genetic testing
Summary Exome sequencing is a rapid and increasingly affordable method of identifying disease mutations CNVs (large structural variants) can be detected We have successfully found mutations in a number of additional recessive disorders, using only a single individual In the next few months dozens of new disease genes/mutations will be identified Next step - Genomic Molecular Diagnosis