Genes, Diseases and Lisa How an advanced ICT research infrastructure contributes to our health Danielle Posthuma Center for Neurogenomics and Cognitive Research VU Amsterdam
Most human diseases are heritable Autism Schizophrenia ADHD Nicotine dependence Anorexia Nervosa Hypertension Obsessive compulsive disorder Post traumatic stress disorder Major depression Anti-social behavior Heritability Anxiety Sleep problems 0 10 20 30 40 50 60 70 80 90 100
If most human diseases are heritable genes must exist that influence disease. Understanding which genetic variants are associated with risk to disease is important for prevention and treatment of disease
Finding genes for disease Two issues generate computational problems: 1. Most disorders are polygenic 2. Scale of genetic variation
1. Monogenic vs. Polygenic diseases Monogenic Influenced by one gene Large genetic effect Small sample size needed Polygenic Influenced by 100 s or 1000 s of genes Small genetic effects Huge sample size needed Possible interactions between genes
2. Scale of genetic variation: The Human Genome 3 billion base pairs (nucleotides) arranged on 23 pairs of chromosomes 99.9% of all base pairs is invariable across humans 0.1% varies = 3 million differences between humans
1 in every 1000 nucleotides differs...atgcagccccggacacagcccctagcc CAAACCCTACCCTTCTTCCTCGGAGGGG CCCCTCGAGACACTGGGCTGCGGGTGCC TGTCATTAAGATGGGCACAGGGTGGGAG GGCTTCCAGCGGACCCTGAAGGAAGTCG CCTACATCCTCCTCTGCTGCTGGTGTATCA AGGAACTGCTGGATTAA...
1 in every 1000 nucleotides differs...atgcagccccggacacagcccctagcc CAAACCCTACCCTTCTTCCTCGGAGGGG CCCCTCGAGACACTGGGCTGCGGGTGCC TGTCATTAAGATGGGCACAGGGTGGGAG GGCTTCCAGCGGACCCTGAAGGAAGTCG CCTACATCCTCCTCTGCTGCTGGTGTATCA AGGAACTGCTGGATTAA... T
Since 2005: genomewide association feasible Human genome project 2001 2006 2010
Gene finding for disease Goal: Determine which of 3 million genetic variants are linked to disease Small effects: thus large samples needed (N=1000 120,000) Millions of genetic association tests Huge files.
Computational problems Example: one relatively small dataset of ±5000 individuals and 3 million genotypes is ~ 80GB, 10000 individuals ± 1.5 TB Full analysis, including quality control tests, genetic imputation and actual analysis would take >> months if not years on a single computer Cluster computers are essential
Genetic association tests are embarrassingly parallel Each genetic variant can be analyzed in isolation Even when considering gene-gene interactions, groups of genes can be analyzed in isolation Typically, statistical geneticists use lots of diskspace and submit 1000 s of jobs at the time
Genetic Cluster Computer @LISA Setup in 2006 with NWO investment grant as part of LISA Currently financed by VU Goal: promote use of cluster computers in gene finding studies www.geneticcluster.org
Genetic Cluster Computer (GCC) @LISA > 150 users all over the world > 50 publications are based on analyses carried out at GCC (incl. Nature, Nature Genetics, Mol Psychiatry, Neuron)
Large consortia use GCC The purpose of the Psychiatric GWAS Consortium (PGC) is to conduct metaanalyses of genomewide association study (GWAS) data. Our primary focus is on five important psychiatric disorders: autism, attention-deficit hyperactivity disorder, bipolar disorder, major depressive disorder, and schizophrenia. PGC includes 111 scientists from 48 institutions and 11 countries and is the largest collaborative effort in the history of psychiatry where so many have come together so quickly to form an effective and forward-looking collaboration. We are deeply grateful to the National Institute of Mental Health (NIMH), NARSAD, and the Netherlands Genetic Computing Cluster for their sponsorship of the PGC. GCC plays a central role in data storage and analysis of PGC
Recent highlights Lips ES, Cornelisse LN, Toonen RF, Min JL, Hultman CM; the International Schizophrenia Consortium, Holmans PA, O Donovan MC, Purcell SM, Smit AB, Verhage M, Sullivan PF, Visscher PM, Posthuma D. Functional gene group analysis identifies synaptic gene groups as risk factor for schizophrenia Mol Psychiatry. 2011 Sep 20. doi: 10.1038/mp. 2011.117. Ripke S, et al. Genome-wide association study identifies five new schizophrenia loci. Nat Genet. 2011 Sep 18;43(10):969-76.
Synaptic mechanisms underlying schizophrenia Lips et al. Applied a novel method where groups of functionally related genes were tested for association with schizophrenia
dbsnp Group of functionally related genes Get all variants on genotyping platform Random group of genes/ SNPs GWAS samples Run genetic association 100 draws 1 2 3 4 5 10,000 permutations Obtain #"log (p) 10 Control methods Empirical P-value of gene-group Combine empirical P- value across samples Empirical P-value of empirical P-value Combine empirical P- value across samples Empirical P-value of combined empirical P-value
Synaptic mechanisms underlying schizophrenia Lips et al. Reports three specific synaptic mechanisms underlying schizophrenia
Genome-wide association study identifies five new schizophrenia loci Ripke et al. The largest genome-wide association study for schizophrenia to date 51,695 individuals Detects 5 new genes and two genes already known These gene findings provide new insight into the pathogenesis of schizophrenia
5 years of cluster computing in genetic association: Has yielded genes for: depression, schizophrenia, coffee drinking, brain volume, ADHD, migraine, educational attainment, IQ, bipolar disorder, attention problems, cannabis use, smoking, anxiety, processing speed, optic disc size, physical exercise, personality, neuroticism and more.. Has provided computing power for developing multiple novel statistical genetic methods
40 years of cluster computing in genetic association is expected to provide even more insight into genetic variation underlying risk to disease. Gene finding for polygenic disorders is unthinkable without clustercomputing!
Acknowledgements All the SARA staff that enable efficient use of GCC/LISA