Supplementary Material to Genome-wide association study identifies new HLA Class II haplotypes strongly protective against narcolepsy Hyun Hor, 1,2, Zoltán Kutalik, 3,4, Yves Dauvilliers, 2,5 Armand Valsesia, 3,4,6 Gert J. Lammers, 7 Claire E.H.M. Donjacour, 7 Alex Iranzo, 8 Joan Santamaria, 8 Rosa Peraita Adrados, 9 José L. Vicario, 10 Sebastiaan Overeem, 11,12 Isabelle Arnulf, 13 Ioannis Theodorou, 14 Poul Jennum, 15 Stine Knudsen, 15 Claudio Bassetti, 16 Johannes Mathis, 17 Michel Lecendreux, 18 Geert Mayer, 19 Peter Geisler, 20 Antonio Benetó, 21 Brice Petit, 1 Corinne Pfister, 1 Julie Vienne Bürki, 1 Gérard Didelot, 1 Michel Billiard, 5 Guadalupe Ercilla, 22 Willem Verduijn, 23 Frans H.J. Claas, 23 Peter Vollenwider, 24 Gerard Waeber, 24 Dawn M. Waterworth, 25 Vincent Mooser, 25 Raphaël Heinzer, 26 Jacques S. Beckmann, 3,27 Sven Bergmann, 3,4 and Mehdi Tafti 1,26 *
Supplementary Note Extended 8.1 haplotype analysis We estimated the extent of the discovered protective haplotype (DRB1*1301-DQB1*0603) by phasing our genotype data (using PLINK 1 ) to reveal all HLA 8.1 ancestral haplotypes present in our population. We then selected individuals that carried exactly one copy of the DRB1*1301-DQB1*0603 haplotype and one copy of the DRB1*1501-DQB1*0602. Next we examined whether they carry shared HLA-A*0101-HLA-B*0801 haplotype, in which case the DRB1*1301-DQB1*0603 signal could extend to the full 8.1 extended ancestry haplotype. Tagging SNPs rs1611635, rs10947089 and rs6457374, rs2844535 were used for HLA-A and HLA-B, respectively. Only the CATT haplotype was shared among the DRB1*1301-DQB1*0603 controls, but this haplotype was frequently found also in non- DRB1*1301-DQB1*0603 carriers (Supplementary Figure 2a). This excludes that the protective signal extends beyond the HLA Class II region. For the DRB1*0301-DQB1*0200, we found that CACT is the only haplotype that is found in (almost) all DRB1*0301-DQB1*0200 carriers, but this haplotype is frequent amongst DRB1*0301-DQB1*0200 non-carriers too. This again, illustrates that the newly discovered signal cannot extend beyond the HLA Class II region. CNV analysis For the CNV analysis we derived copy number (CN) ratios from the raw CEL files. These ratios allowed us to estimate whether a genomic region is duplicated or deleted compared to a reference. All the normalization steps were done using the Aroma.affymetrix framework 1 and included Allelic Cross-talk calibration 2 to correct for differences between SNP s and offset in CN probes; intensity summarization using Robust Median Average and correction for any PCR amplification bias inherent to the Affymetrix SNP platform. To estimate the CN ratios for a given sample at a given SNP or CN probe, we computed the log 2 ratio of the normalized intensity of this probe divided by the median across all the samples from the same batch.
Raw copy number ratios were smoothed along physical position using Loess filtering with a 41-probe window size. Next, four component Gaussian mixture model (one component for each of the following copy number states: deletion, copy-neutral, 1 and 2 additional copy) was fitted to the smoothed copy number ratios with a constraint on the difference between the mixture means. Next, for a given individual we determined the probabilities for each copy number states. The copy number was finally determined as the expected copy number (dosage) and rounded to the nearest integer. Since CNV calls showed batch-specific hybridization efficiency, we used the batch-incidence matrix as covariate in addition to the principal components to correct for the batch effect. This substantially reduced the effective sample size, since batches with only case/control subjects were ignored. The subsequent genome-wide search for association between narcolepsy and CNVs revealed three genomic regions within the HLA class II region (best hit probe CN_425606, P = 4 x 10-11 ). These CNV regions are part of previously identified CNVs 3-8 : (1) the first region extends approximately 40kb (32560758-32607481) and contains HLA-DRB5; (2) the second region (32705875-32733668) is 25kb long and contains the HLA-DQA1 gene; (3) the last region (32796767-32805569) is 10kb long and is in between DQB1 and DQA2 (Supplementary Figure 1). No other genomic region showed significant (P < 10-6 ) association with narcolepsy in our sample. However, considering that genomic reorganizations within the HLA Class II region, due to the presence or absence of different DRB1 gens and several pseudogenes, give rise to at least 8 different haplotypes, the 3 identified CNVs may not be independent of HLA class II haplotypes. Accordingly, qpcr analysis indicated that the first CNV mainly tags the presence of DRB5, which is only present within the DRB51 (DRB1*15 and DRB1*16) haplotype. Note that more cases are DRB1*1501 homozygous and that among DRB1*1501 heterozygous, DRB1*1601 is significantly increased in cases. The second CNV could not be confirmed by qpcr and did not yield any multiple copy number. This apparent/pseudo CNV might stem from the high polymorphism of the region resulting in different hybridization signals. The third CNV corresponds to a hot spot of recombination immediately centromeric to DQB1. Note that SNP rs2858884 (32,808,061) is located just 2.5kb centromeric of this CNV region. Again this CNV could not be confirmed to be independent of HLA haplotypes, indicating that these CNVs are common genomic reorganizations reliably tagged by SNPs and HLA haplotypes, as recently found also in several other HLA-associated disorders 9.
Supplementary Table 1. Genome-wide association top hit SNPs (p<10-5 ) Chr Pos rs# A B r-sqhat AF 9 AF (case) odds ratio odds ratio CI95 P Gene Distance (kb)* 4 70294133 rs17147266 A G 0.91 0.02 0.03 0.23 [0.12-0.44] 6.50E-06 UGT2B4 86 6 29464566 rs379157 A G 0.92 0.45 0.37 1.56 [1.30-1.87] 1.44E-06 OR12D2 8 6 29674348 rs3095273 A G 0.91 0.69 0.61 1.59 [1.31-1.93] 2.04E-06 GABBR1 4 10 109151635 rs11193407 A C 1 0.06 0.1 0.49 [0.36-0.67] 7.37E-06 SORCS1 237 10 109161961 rs10787024 C T 1 0.94 0.9 2.05 [1.50-2.81] 7.32E-06 SORCS1 248 13 24144838 rs9511411 C T 0.77 0.36 0.31 1.67 [1.33-2.09] 7.54E-06 ATP12A 8 18 6989628 rs625106 C T 1 0.54 0.63 0.67 [0.57-0.79] 3.38E-06 LAMA1 0 Chr: chromosome, Pos: physical map position, r-sq-hat: imputation quality, AF: frequency, CI95: 95% confidence interval, * 0 indicate that the SNP maps within the indicated gene. In case when multiple hits are all below the threshold and are within 10kb from each other, only the one with the lowest p-value is reported.
Supplementary Table 2. Replication attempt of two selected top hits from Supplementary Table 1 Chr Pos rs# Gene Distance to Gene (kb) risk other r-sq-hat 13 24144838 rs9511411 ATP12A 8 T C 0.79 18 6989628 rs625106 LAMA1 0 C T 1 stage effect freq controls cases odds ratio odds ratio CI95 discovery 0.36 0.39 1.67 [1.33-2.09] 7.54x10-6 replication 0.37 0.41 1.16 [0.88-1.30] 0.9 combined 0.36 0.40 1.21 [1.11-1.32] 1.89x10-5 discovery 0.37 0.46 1.49 [1.27-1.75] 3.38x10-6 replication 0.43 0.46 1.15 [0.87-1.30] 0.8 combined 0.4 0.46 1.18 [1.09-1.27] 2.01x10-5 P Abbreviations as in Supplementary Table 1. Supplementary Table 3. Genome-wide analysis of SNP associations (p<10-5 ) in HLA-DRB1*1501 positive individuals. Chr Pos rs# A B r-sqhat AF 9 AF (case) odds ratio odds ratio CI95 P Gene Distance (kb) 4 9640261 rs6856396 A T 1 0.82 0.89 1.81 [1.40-2.34] 5.14E-06 SLC2A9 0 6 32320211 rs9267948 A G 0.96 0.57 0.67 1.74 [1.37-2.22] 5.68E-06 NOTCH4 20 6 32413957 rs3129900 G T 1 0.52 0.42 0.42 [0.30-0.58] 1.14E-07 C6orf10 0 6 32462498 rs1555115 C G 1 0.09 0.03 0.37 [0.25-0.55] 8.62E-07 BTNL2 8 6 32808061 rs2858884 A C 1 0.19 0.1 0.48 [0.37-0.62] 4.65E-08 HLA-DQA2 9 7 89722103 rs11563921 A G 1 0.89 0.95 2.35 [1.65-3.36] 2.31E-06 FLJ21062 0 12 5362721 rs1002101 A G 0.8 0.58 0.68 1.64 [1.33-2.04] 5.98E-06 NTF3 49 12 90685515 rs10777331 C T 0.9 0.88 0.82 0.5 [0.37-0.68] 8.94E-06 BTG1 376 12 94892386 rs2068662 A G 0.99 0.09 0.04 0.39 [0.26-0.58] 4.67E-06 HAL 0 14 34146098 rs17102628 G T 0.99 0.94 0.88 0.49 [0.36-0.66] 5.81E-06 SNX6 0 17 62339062 rs2109266 C T 0.62 0.53 0.49 0.5 [0.37-0.67] 2.90E-06 CACNG5 27 Abbreviations as in Supplementary Table 1.
Supplementary Figure 1. Phased extended 8.1 haplotype structure of the HLA-DRB1*1501/HLA- DRB1*03-DQB1*02 (a) (b) (a) Extended 8.1 haplotype configurations for DRB1*1301-DQB1*0603 carriers and non-carriers. No single HLA-A-HLA-B haplotype can distinguish between DRB1*1301-DQB1*0603 carriers and non-carriers. (b) Extended 8.1 haplotype configurations for DRB1*03-DQB1*02 carriers and non-carriers. Here, again, no single HLA-A-HLA-B haplotype can distinguish between DRB1*03-DQB1*02 carriers and non-carriers.
Supplementary Figure 2. Manhattan plot for CNV and SNP associations in the HLA region (a) (b) (a) Manhattan plot for SNP and CNV associations in the HLA region. For this analysis we used the batch incidence matrix as covariates in the logistic regression (both for SNPs and CNVs to retain comparability of the results). One can observe that the top CNV and SNPs signals are of the same strength hence one cannot determine causality. (b) Manhattan plot for SNP and CNV associations in the HLA region for HLA positive individuals only corrected for the copy number of the HLA-DRB1 haplotype. For this analysis we used the batch incidence matrix and the rs3135388 genotype as covariates in the logistic regression (both for SNPs and CNVs to retain comparability of the results). HLA-DRB1*1501 independent CNV signals emerge in the DRB5 and DQA1 regions.
Reference: 1. Bengtsson, H., Simpson, K., Bullard, J. & K. Hansen, K. A generic framework in R for analyzing small to very large Affymetrix data sets in bounded memory, Tech. Report #745 (Department of Statistics, University of California, Berkeley, 2008). 2. Bengtsson, H., Irizarry, R., Carvalho, B. & Speed, T.P. Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics 24, 759-67 (2008). 3. Korbel, J.O. et al. Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome. Proc Natl Acad Sci U S A 104, 10110-5 (2007). 4. McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40, 1166-74 (2008). 5. Perry, G.H. et al. Copy number variation and evolution in humans and chimpanzees. Genome Res 18, 1698-710 (2008). 6. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444-54 (2006). 7. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872-6 (2008). 8. Wong, K.K. et al. A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet 80, 91-104 (2007). 9. Wellcome Trust Case Control, C. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713-20.