Supplementary Material to. Genome-wide association study identifies new HLA Class II haplotypes strongly protective against narcolepsy

Similar documents
Cover Page. The handle holds various files of this Leiden University dissertation.

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

New Enhancements: GWAS Workflows with SVS

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Global variation in copy number in the human genome

Supplementary Figures

DQB1 LOCUS AND THE RISK AND PROTECTION IN NARCOLEPSY WITH CATAPLEXY

Effects of age-at-diagnosis and duration of diabetes on GADA and IA-2A positivity

LTA Analysis of HapMap Genotype Data

Genomic structural variation

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Supplementary Figure 1 Dosage correlation between imputed and genotyped alleles Imputed dosages (0 to 2) of 2-digit alleles (red) and 4-digit alleles

Introduction to Genetics and Genomics

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer

Understanding DNA Copy Number Data

FONS Nové sekvenační technologie vklinickédiagnostice?

Supplementary Figures

Analysis of CGH and SNP arrays for the detection of chromosomal aberrations in single cells

# For the GWAS stage, B-cell NHL cases which small numbers (N<20) were excluded from analysis.

Nature Genetics: doi: /ng Supplementary Figure 1. Study design.

DNA-seq Bioinformatics Analysis: Copy Number Variation

Results. Introduction

The Human Major Histocompatibility Complex

Nature Genetics: doi: /ng Supplementary Figure 1

Tutorial on Genome-Wide Association Studies

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

Introduction to LOH and Allele Specific Copy Number User Forum

Interactive analysis and quality assessment of single-cell copy-number variations

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

GENOME-WIDE ASSOCIATION STUDIES

SNP array-based analyses of unbalanced embryos as a reference to distinguish between balanced translocation carrier and normal blastocysts

A Genome-wide Association Study in Han Chinese Identifies Multiple. Susceptibility loci for IgA Nephropathy. Supplementary Material

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017

Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels.

On Missing Data and Genotyping Errors in Association Studies

Colorspace & Matching

National Narcolepsy Task Force Interim Report 31 January 2011

Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits?

November 9, Johns Hopkins School of Medicine, Baltimore, MD,

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Population Genetics of Structural Variation Speaker Dr. Don Conrad

Association-heterogeneity mapping identifies an Asian-specific association of the GTF2I locus with rheumatoid arthritis

SUPPLEMENTARY INFORMATION

Integrated detection and population-genetic analysis. of SNPs and copy number variation

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

Significance of the MHC

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

Human population sub-structure and genetic association studies

Structural Variants and Susceptibility to Common Human Disorders Dr. Xavier Estivill

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Completing the CIBMTR Confirmation of HLA Typing Form (Form 2005)

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

SALSA MLPA KIT P050-B2 CAH

MRC-Holland MLPA. Related SALSA MLPA probemixes P190 CHEK2: Breast cancer susceptibility, genes included: CHEK2, ATM, PTEN, TP53.

Research: Genetics HLA class II gene associations in African American Type 1 diabetes reveal a protective HLA-DRB1*03 haplotype

p.r623c p.p976l p.d2847fs p.t2671 p.d2847fs p.r2922w p.r2370h p.c1201y p.a868v p.s952* RING_C BP PHD Cbp HAT_KAT11

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

Genome-wide association studies for human narcolepsy and other complex diseases

Nature Neuroscience: doi: /nn Supplementary Figure 1. Missense damaging predictions as a function of allele frequency

CS2220 Introduction to Computational Biology

Rare Variant Burden Tests. Biostatistics 666

CPT Codes for Pharmacogenomic Tests

Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer

Structural Variation and Medical Genomics

Pirna Sequence Variants Associated With Prostate Cancer In African Americans And Caucasians

A genome-wide association study identifies vitiligo

Big Data Training for Translational Omics Research. Session 1, Day 3, Liu. Case Study #2. PLOS Genetics DOI: /journal.pgen.

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Integrated Analysis of Copy Number and Gene Expression

Agilent s Copy Number Variation (CNV) Portfolio

CNV detection. Introduction and detection in NGS data. G. Demidov 1,2. NGSchool2016. Centre for Genomic Regulation. CNV detection. G.

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

MRC-Holland MLPA. Description version 18; 09 September 2015

Integrated detection and population-genetic analysis of SNPs and copy number variation

Integrated detection and population-genetic analysis of SNPs and copy number variation

American Psychiatric Nurses Association

Children, Toronto, Ontario, Canada. Department of Laboratory Medicine and Pathobiology Hospital for Sick Children, Toronto, Ontario, Canada, M5G 1X8

Nature Methods: doi: /nmeth.3115

Supplementary Figure S1A

Genetics and Genomics in Medicine Chapter 8 Questions

SUPPLEMENTARY DATA. 1. Characteristics of individual studies

UTILIZATION OF A SNP MICROARRAY FOR CHRONIC LYMPHOCYTIC LEUKEMIA: EFFICACY, INFORMATIVE FINDINGS AND PROGNOSTIC CAPABILITIES

Quality Control Analysis of Add Health GWAS Data

Introduction to the Genetics of Complex Disease

Multiple Copy Number Variations in a Patient with Developmental Delay ASCLS- March 31, 2016

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

MRC-Holland MLPA. Description version 06; 23 December 2016

A genome-wide study identifies HLA alleles associated with lumiracoxib-related liver injury

Associating Copy Number and SNP Variation with Human Disease. Autism Segmental duplication Neurobehavioral, includes social disability

Human Genetics of Tuberculosis. Laurent Abel Laboratory of Human Genetics of Infectious Diseases University Paris Descartes/INSERM U980

Implementation of the DDD/ClinGen OGT (CytoSure v3) Microarray

Genomics 101 (2013) Contents lists available at SciVerse ScienceDirect. Genomics. journal homepage:

Self reported ethnicity

Doing more with genetics: Gene-environment interactions

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

SALSA MLPA KIT P060-B2 SMA

Transcription:

Supplementary Material to Genome-wide association study identifies new HLA Class II haplotypes strongly protective against narcolepsy Hyun Hor, 1,2, Zoltán Kutalik, 3,4, Yves Dauvilliers, 2,5 Armand Valsesia, 3,4,6 Gert J. Lammers, 7 Claire E.H.M. Donjacour, 7 Alex Iranzo, 8 Joan Santamaria, 8 Rosa Peraita Adrados, 9 José L. Vicario, 10 Sebastiaan Overeem, 11,12 Isabelle Arnulf, 13 Ioannis Theodorou, 14 Poul Jennum, 15 Stine Knudsen, 15 Claudio Bassetti, 16 Johannes Mathis, 17 Michel Lecendreux, 18 Geert Mayer, 19 Peter Geisler, 20 Antonio Benetó, 21 Brice Petit, 1 Corinne Pfister, 1 Julie Vienne Bürki, 1 Gérard Didelot, 1 Michel Billiard, 5 Guadalupe Ercilla, 22 Willem Verduijn, 23 Frans H.J. Claas, 23 Peter Vollenwider, 24 Gerard Waeber, 24 Dawn M. Waterworth, 25 Vincent Mooser, 25 Raphaël Heinzer, 26 Jacques S. Beckmann, 3,27 Sven Bergmann, 3,4 and Mehdi Tafti 1,26 *

Supplementary Note Extended 8.1 haplotype analysis We estimated the extent of the discovered protective haplotype (DRB1*1301-DQB1*0603) by phasing our genotype data (using PLINK 1 ) to reveal all HLA 8.1 ancestral haplotypes present in our population. We then selected individuals that carried exactly one copy of the DRB1*1301-DQB1*0603 haplotype and one copy of the DRB1*1501-DQB1*0602. Next we examined whether they carry shared HLA-A*0101-HLA-B*0801 haplotype, in which case the DRB1*1301-DQB1*0603 signal could extend to the full 8.1 extended ancestry haplotype. Tagging SNPs rs1611635, rs10947089 and rs6457374, rs2844535 were used for HLA-A and HLA-B, respectively. Only the CATT haplotype was shared among the DRB1*1301-DQB1*0603 controls, but this haplotype was frequently found also in non- DRB1*1301-DQB1*0603 carriers (Supplementary Figure 2a). This excludes that the protective signal extends beyond the HLA Class II region. For the DRB1*0301-DQB1*0200, we found that CACT is the only haplotype that is found in (almost) all DRB1*0301-DQB1*0200 carriers, but this haplotype is frequent amongst DRB1*0301-DQB1*0200 non-carriers too. This again, illustrates that the newly discovered signal cannot extend beyond the HLA Class II region. CNV analysis For the CNV analysis we derived copy number (CN) ratios from the raw CEL files. These ratios allowed us to estimate whether a genomic region is duplicated or deleted compared to a reference. All the normalization steps were done using the Aroma.affymetrix framework 1 and included Allelic Cross-talk calibration 2 to correct for differences between SNP s and offset in CN probes; intensity summarization using Robust Median Average and correction for any PCR amplification bias inherent to the Affymetrix SNP platform. To estimate the CN ratios for a given sample at a given SNP or CN probe, we computed the log 2 ratio of the normalized intensity of this probe divided by the median across all the samples from the same batch.

Raw copy number ratios were smoothed along physical position using Loess filtering with a 41-probe window size. Next, four component Gaussian mixture model (one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy) was fitted to the smoothed copy number ratios with a constraint on the difference between the mixture means. Next, for a given individual we determined the probabilities for each copy number states. The copy number was finally determined as the expected copy number (dosage) and rounded to the nearest integer. Since CNV calls showed batch-specific hybridization efficiency, we used the batch-incidence matrix as covariate in addition to the principal components to correct for the batch effect. This substantially reduced the effective sample size, since batches with only case/control subjects were ignored. The subsequent genome-wide search for association between narcolepsy and CNVs revealed three genomic regions within the HLA class II region (best hit probe CN_425606, P = 4 x 10-11 ). These CNV regions are part of previously identified CNVs 3-8 : (1) the first region extends approximately 40kb (32560758-32607481) and contains HLA-DRB5; (2) the second region (32705875-32733668) is 25kb long and contains the HLA-DQA1 gene; (3) the last region (32796767-32805569) is 10kb long and is in between DQB1 and DQA2 (Supplementary Figure 1). No other genomic region showed significant (P < 10-6 ) association with narcolepsy in our sample. However, considering that genomic reorganizations within the HLA Class II region, due to the presence or absence of different DRB1 gens and several pseudogenes, give rise to at least 8 different haplotypes, the 3 identified CNVs may not be independent of HLA class II haplotypes. Accordingly, qpcr analysis indicated that the first CNV mainly tags the presence of DRB5, which is only present within the DRB51 (DRB1*15 and DRB1*16) haplotype. Note that more cases are DRB1*1501 homozygous and that among DRB1*1501 heterozygous, DRB1*1601 is significantly increased in cases. The second CNV could not be confirmed by qpcr and did not yield any multiple copy number. This apparent/pseudo CNV might stem from the high polymorphism of the region resulting in different hybridization signals. The third CNV corresponds to a hot spot of recombination immediately centromeric to DQB1. Note that SNP rs2858884 (32,808,061) is located just 2.5kb centromeric of this CNV region. Again this CNV could not be confirmed to be independent of HLA haplotypes, indicating that these CNVs are common genomic reorganizations reliably tagged by SNPs and HLA haplotypes, as recently found also in several other HLA-associated disorders 9.

Supplementary Table 1. Genome-wide association top hit SNPs (p<10-5 ) Chr Pos rs# A B r-sqhat AF 9 AF (case) odds ratio odds ratio CI95 P Gene Distance (kb)* 4 70294133 rs17147266 A G 0.91 0.02 0.03 0.23 [0.12-0.44] 6.50E-06 UGT2B4 86 6 29464566 rs379157 A G 0.92 0.45 0.37 1.56 [1.30-1.87] 1.44E-06 OR12D2 8 6 29674348 rs3095273 A G 0.91 0.69 0.61 1.59 [1.31-1.93] 2.04E-06 GABBR1 4 10 109151635 rs11193407 A C 1 0.06 0.1 0.49 [0.36-0.67] 7.37E-06 SORCS1 237 10 109161961 rs10787024 C T 1 0.94 0.9 2.05 [1.50-2.81] 7.32E-06 SORCS1 248 13 24144838 rs9511411 C T 0.77 0.36 0.31 1.67 [1.33-2.09] 7.54E-06 ATP12A 8 18 6989628 rs625106 C T 1 0.54 0.63 0.67 [0.57-0.79] 3.38E-06 LAMA1 0 Chr: chromosome, Pos: physical map position, r-sq-hat: imputation quality, AF: frequency, CI95: 95% confidence interval, * 0 indicate that the SNP maps within the indicated gene. In case when multiple hits are all below the threshold and are within 10kb from each other, only the one with the lowest p-value is reported.

Supplementary Table 2. Replication attempt of two selected top hits from Supplementary Table 1 Chr Pos rs# Gene Distance to Gene (kb) risk other r-sq-hat 13 24144838 rs9511411 ATP12A 8 T C 0.79 18 6989628 rs625106 LAMA1 0 C T 1 stage effect freq controls cases odds ratio odds ratio CI95 discovery 0.36 0.39 1.67 [1.33-2.09] 7.54x10-6 replication 0.37 0.41 1.16 [0.88-1.30] 0.9 combined 0.36 0.40 1.21 [1.11-1.32] 1.89x10-5 discovery 0.37 0.46 1.49 [1.27-1.75] 3.38x10-6 replication 0.43 0.46 1.15 [0.87-1.30] 0.8 combined 0.4 0.46 1.18 [1.09-1.27] 2.01x10-5 P Abbreviations as in Supplementary Table 1. Supplementary Table 3. Genome-wide analysis of SNP associations (p<10-5 ) in HLA-DRB1*1501 positive individuals. Chr Pos rs# A B r-sqhat AF 9 AF (case) odds ratio odds ratio CI95 P Gene Distance (kb) 4 9640261 rs6856396 A T 1 0.82 0.89 1.81 [1.40-2.34] 5.14E-06 SLC2A9 0 6 32320211 rs9267948 A G 0.96 0.57 0.67 1.74 [1.37-2.22] 5.68E-06 NOTCH4 20 6 32413957 rs3129900 G T 1 0.52 0.42 0.42 [0.30-0.58] 1.14E-07 C6orf10 0 6 32462498 rs1555115 C G 1 0.09 0.03 0.37 [0.25-0.55] 8.62E-07 BTNL2 8 6 32808061 rs2858884 A C 1 0.19 0.1 0.48 [0.37-0.62] 4.65E-08 HLA-DQA2 9 7 89722103 rs11563921 A G 1 0.89 0.95 2.35 [1.65-3.36] 2.31E-06 FLJ21062 0 12 5362721 rs1002101 A G 0.8 0.58 0.68 1.64 [1.33-2.04] 5.98E-06 NTF3 49 12 90685515 rs10777331 C T 0.9 0.88 0.82 0.5 [0.37-0.68] 8.94E-06 BTG1 376 12 94892386 rs2068662 A G 0.99 0.09 0.04 0.39 [0.26-0.58] 4.67E-06 HAL 0 14 34146098 rs17102628 G T 0.99 0.94 0.88 0.49 [0.36-0.66] 5.81E-06 SNX6 0 17 62339062 rs2109266 C T 0.62 0.53 0.49 0.5 [0.37-0.67] 2.90E-06 CACNG5 27 Abbreviations as in Supplementary Table 1.

Supplementary Figure 1. Phased extended 8.1 haplotype structure of the HLA-DRB1*1501/HLA- DRB1*03-DQB1*02 (a) (b) (a) Extended 8.1 haplotype configurations for DRB1*1301-DQB1*0603 carriers and non-carriers. No single HLA-A-HLA-B haplotype can distinguish between DRB1*1301-DQB1*0603 carriers and non-carriers. (b) Extended 8.1 haplotype configurations for DRB1*03-DQB1*02 carriers and non-carriers. Here, again, no single HLA-A-HLA-B haplotype can distinguish between DRB1*03-DQB1*02 carriers and non-carriers.

Supplementary Figure 2. Manhattan plot for CNV and SNP associations in the HLA region (a) (b) (a) Manhattan plot for SNP and CNV associations in the HLA region. For this analysis we used the batch incidence matrix as covariates in the logistic regression (both for SNPs and CNVs to retain comparability of the results). One can observe that the top CNV and SNPs signals are of the same strength hence one cannot determine causality. (b) Manhattan plot for SNP and CNV associations in the HLA region for HLA positive individuals only corrected for the copy number of the HLA-DRB1 haplotype. For this analysis we used the batch incidence matrix and the rs3135388 genotype as covariates in the logistic regression (both for SNPs and CNVs to retain comparability of the results). HLA-DRB1*1501 independent CNV signals emerge in the DRB5 and DQA1 regions.

Reference: 1. Bengtsson, H., Simpson, K., Bullard, J. & K. Hansen, K. A generic framework in R for analyzing small to very large Affymetrix data sets in bounded memory, Tech. Report #745 (Department of Statistics, University of California, Berkeley, 2008). 2. Bengtsson, H., Irizarry, R., Carvalho, B. & Speed, T.P. Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics 24, 759-67 (2008). 3. Korbel, J.O. et al. Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome. Proc Natl Acad Sci U S A 104, 10110-5 (2007). 4. McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40, 1166-74 (2008). 5. Perry, G.H. et al. Copy number variation and evolution in humans and chimpanzees. Genome Res 18, 1698-710 (2008). 6. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444-54 (2006). 7. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872-6 (2008). 8. Wong, K.K. et al. A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet 80, 91-104 (2007). 9. Wellcome Trust Case Control, C. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713-20.