A genome-wide association study identifies two new risk loci for Graves disease

Similar documents
Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects identifies susceptibility loci at PLCE1 and C20orf54

Supplementary Figure 1 Dosage correlation between imputed and genotyped alleles Imputed dosages (0 to 2) of 2-digit alleles (red) and 4-digit alleles

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

A Genome-wide Association Study in Han Chinese Identifies Multiple. Susceptibility loci for IgA Nephropathy. Supplementary Material

Supplementary Figures

Lack of association of IL-2RA and IL-2RB polymorphisms with rheumatoid arthritis in a Han Chinese population

# For the GWAS stage, B-cell NHL cases which small numbers (N<20) were excluded from analysis.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Assessing Accuracy of Genotype Imputation in American Indians

Association-heterogeneity mapping identifies an Asian-specific association of the GTF2I locus with rheumatoid arthritis

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin,

Results. Introduction

Review Article Association between HLA-DQ Gene Polymorphisms and HBV-Related Hepatocellular Carcinoma

Quality Control Analysis of Add Health GWAS Data

A genome-wide association study identifies two new cervical cancer susceptibility loci at 4q12 and 17q12. Supplementary Materials

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Supplementary Figures

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

Drug Metabolism Disposition

CS2220 Introduction to Computational Biology

Nature Genetics: doi: /ng Supplementary Figure 1. Study design.

A genome-wide association study identifies vitiligo

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

New Enhancements: GWAS Workflows with SVS

Pirna Sequence Variants Associated With Prostate Cancer In African Americans And Caucasians

Human population sub-structure and genetic association studies

SUPPLEMENTARY DATA. 1. Characteristics of individual studies

Global variation in copy number in the human genome

Tutorial on Genome-Wide Association Studies

Nature Genetics: doi: /ng Supplementary Figure 1

New evidence of TERT rs polymorphism and cancer risk: an updated meta-analysis

Genetic variants on 17q21 are associated with asthma in a Han Chinese population

Laboratory of Chronic Kidney Disease Prevention and Treatment (Peking University), Ministry of Education; Beijing, , People's Republic of China

Association between interleukin-17a polymorphism and coronary artery disease susceptibility in the Chinese Han population

Int J Clin Exp Med 2015;8(8): /ISSN: /IJCEM Guo-Xi Jin 1*, Yu-Ye Zhou 2*, Lei Yu 3, Ya-Xin Bi 4

Single nucleotide polymorphisms in ZNRD1-AS1 increase cancer risk in an Asian population

Supplementary information

Genetics and Genomics in Medicine Chapter 8 Questions

Nature Genetics: doi: /ng Supplementary Figure 1

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

Myoglobin A79G polymorphism association with exercise-induced skeletal muscle damage

IL10 rs polymorphism is associated with liver cirrhosis and chronic hepatitis B

696 Biomed Environ Sci, 2015; 28(9):

Influence of interleukin-18 gene polymorphisms on acute pancreatitis susceptibility in a Chinese population

Genome-wide association studies (case/control and family-based) Heather J. Cordell, Institute of Genetic Medicine Newcastle University, UK

Human leukocyte antigen-b27 alleles in Xinjiang Uygur patients with ankylosing spondylitis

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer

Chromatin marks identify critical cell-types for fine-mapping complex trait variants

Influence of interleukin-17 gene polymorphisms on the development of pulmonary tuberculosis

Bin Liu, Lei Yang, Binfang Huang, Mei Cheng, Hui Wang, Yinyan Li, Dongsheng Huang, Jian Zheng,

The association between TCM syndromes and SCAP polymorphisms in subjects with non-alcoholic fatty liver disease

Liu Jing and Liu Jing Diagnosis System in Classical TCM Discussions of Six Divisions or Six Confirmations Diagnosis System in Classical TCM Texts

Transferability of Type 2 Diabetes Implicated Loci in Multi-Ethnic Cohorts from Southeast Asia

Genomics 101 (2013) Contents lists available at SciVerse ScienceDirect. Genomics. journal homepage:

Genome-wide association studies for human narcolepsy and other complex diseases

Association of Single Nucleotide Polymorphisms (SNPs) in CCR6, TAGAP and TNFAIP3 with Rheumatoid Arthritis in African Americans

FTO Polymorphisms Are Associated with Obesity But Not with Diabetes in East Asian Populations: A Meta analysis

Research: Genetics HLA class II gene associations in African American Type 1 diabetes reveal a protective HLA-DRB1*03 haplotype

Investigation on ERCC5 genetic polymorphisms and the development of gastric cancer in a Chinese population

SUPPLEMENTARY INFORMATION

Supplementary Online Content

Supplementary information. Supplementary figure 1. Flow chart of study design

Title:Validation study of candidate single nucleotide polymorphisms associated with left ventricular hypertrophy in the Korean population

Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels.

Association between atopic dermatitis-related single nucleotide polymorphisms rs and psoriasis vulgaris in a southern Chinese cohort

EMPEROR'S COLLEGE MTOM COURSE SYLLABUS HERB FORMULAE II

Diabetes Care Publish Ahead of Print, published online August 19, 2010

Supplementary Figure 1. Quantile-quantile (Q-Q) plots. (Panel A) Q-Q plot graphical

Introduction to the Genetics of Complex Disease

l e t t e r s A genome-wide association study of nasopharyngeal carcinoma identifies three new susceptibility loci

Publications (* denote senior corresponding author)

Preliminary Agenda. 9:55-10:20 How to overcome TKI resistance? James Chih-Hsin Yang 10:20-10:40 Discussion All 10:40-10:55 Coffee break

SUPPLEMENTARY FIGURES

A case-control study indicates that the TRIB1 gene is associated with pancreatic cancer

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

Genomic structural variation

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017

Canqiu Yu 1, Jinwei Chen 2, Li Huang 3*

An excessive increase in glutamate contributes to glucose-toxicity in. β-cells via activation of pancreatic NMDA receptors in rodent diabetes

Nature Genetics: doi: /ng Supplementary Figure 1. Country distribution of GME samples and designation of geographical subregions.

IDENTIFICATION OF QTLS FOR STARCH CONTENT IN SWEETPOTATO (IPOMOEA BATATAS (L.) LAM.)

Research Article Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data

LETTERS. Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma

Introduction to Genetics and Genomics

Genome-wide association study identifies three new susceptibility loci for adult asthma in the Japanese population

Potential risk factor of Graves' orbitopathy among Chinese patients: A clinical investigation.

Association Between F9 Malmö, Factor IX And Deep Vein Thrombosis

Association between ERCC1 and ERCC2 polymorphisms and breast cancer risk in a Chinese population

Rapid Detection of Milk Protein based on Proteolysis Catalyzed by Trypsinase

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Nature Neuroscience: doi: /nn Supplementary Figure 1. Missense damaging predictions as a function of allele frequency

Identification of regions with common copy-number variations using SNP array

Integration of GWAS SNPs and tissue specific expression profiling reveal discrete eqtls for human traits in blood and brain

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Imaging Genetics: Heritability, Linkage & Association

GENOME-WIDE ASSOCIATION STUDIES

A total of 2,822 Mexican dyslipidemic cases and controls were recruited at INCMNSZ in

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Transcription:

SUPPLEMENTARY INFORMATION FOR: A genome-wide association study identifies two new risk loci for Graves disease The Chinese Autoimmune Thyroid Disease Genetics Research Consortium (CAITDGRC), Xun Chu 1,2*, Chun-Ming Pan 1,3*, Shuang-Xia Zhao 1,3*, Jun Liang 4*, Guan-Qi Gao 5*, Xiao-Mei Zhang 6*, Guo-Yue Yuan 7, Chang-Gui Li 8, Li-Qiong Xue 1, Min Shen 2, Wei Liu 1, Fang Xie 1,2, Shao-Ying Yang 1, Hai-Feng Wang 2, Jing-Yi Shi 1, Wei-Wei Sun 2, Wen-Hua Du 5, Chun-Lin Zuo 1, Jin-Xiu Shi 1,2, Bing-Li Liu 1, Cui-Cui Guo 1, Ming Zhan 1, Zhao-Hui Gu 1, Xiao-Na Zhang 1, Fei Sun 1, Zhi-Quan Wang 1, Zhi-Yi Song 1, Cai-Yan Zou 4, Wei-Hua Sun 6, Ting Guo 1,3, Huang-Ming Cao 1, Jun-Hua Ma 1, Bing Han 1, Ping Li 1,3, He Jiang 1, Qiu-Hua Huang 1, Liming Liang 9, Li-Bin Liu 10, Gang Chen 11, Qing Su 12, Yong-De Peng 13, Jia-Jun Zhao 14, Guang Ning 3, Zhu Chen 1, Jia-Lun Chen 3, Sai-Juan Chen 1, Wei Huang 1,2, Huai-Dong Song 1,3 1. State Key Laboratory of Medical Genomics, Ruijin Hospital Affiliated to Shanghai Jiaotong University (SJTU) School of Medicine, Shanghai 200025, China 2. Department of Genetics, Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center, Shanghai 201203, China 3. Shanghai Institute of Endocrinology and Metabolism, Department of Endocrinology, Ruijin Hospital Affiliated to SJTU School of Medicine, Shanghai, China 4. Department of Endocrinology, The Central Hospital of Xuzhou Affiliated to Xuzhou Medical College, Xuzhou, Jiangsu Province 221009, China 5. Department of Endocrinology, The People s Hospital of Linyi, 27 Liberation Road, Linyi, Shandong Province 276003, China 6. Department of Endocrinology, The First Hospital Affiliated to Bengbu Medical College, 287 Changhuai Road, Bengbu, Anhui Province 233004, China 7. Department of Endocrinology, The Hospital Affiliated to Jiangsu University, Zhenjiang, Jiangsu Province 212001, China 8. Department of Endocrinology and Gout laboratory, Medical School Hospital of Qingdao University, 16 Jiangsu Road, Qingdao 266003, China 9. Department of Epidemiology and Biostatistics, Harvard School of Public Health, Building 2, Room 211A, 655 Huntington Ave, Boston, Massachusetts 02115, USA 10. Department of Endocrinology, Xiehe Hospital Affiliated to Fujian Medical University, 29 Xinquan Road, Fuzhou, Fujian Province 350001, China 11. Department of Endocrinology, Fujian Province Hospital, 134 East Street, Fuzhou, Fujian Province 350001, China 12. Department of Endocrinology, Xin-Hua Hospital Affiliated to Shanghai Jiaotong 1

University (SJTU) School of Medicine, Shanghai 200092, China 13. Department of Endocrinology, The First People s Hospital Affiliated to Shanghai Jiaotong University, Shanghai 200080, China 14. Department of Endocrinology, Shandong Province Hospital, Shandong University, 324 Jing 5 Road, Jinan 250021, China * These authors contributed equally to this work. These authors are co-corresponding authors. To whom correspondence should be addressed at huaidong_s1966@163.com, huangwei@chgc.sh.cn, sjchen@stn.sh.cn, xuyanrr@yahoo.com.cn, zchen@stn.sh.cn or guangning@medmail.com.cn 2

Supplementary Note Study subjects All samples in this study were recruited from the Chinese Han population through collaboration with multiple hospitals in China 1,2. Informed consent was obtained from all subjects using protocols approved by the local institutional review board. We collected 5ml blood samples from each participant for DNA preparation and biochemical measurements. In the first stage of investigation using GWAS, 1,536 patients with Graves disease (GD) were recruited (Supplementary Table 9) 1,2. Diagnosis of GD was based on documented clinical and biochemical evidence of hyperthyroidism, diffuse goiter, and the presence of at least one of the following: positive TSH receptor antibody tests, diffusely increased 131 I (iodine-131) uptake in the thyroid gland, or exophthalmos 1,2. All individuals classified as having GD were interviewed and examined by experienced clinicians. Plasma levels of thyroid stimulating hormone receptor (TSHR) autoantibody (TRAb) in all GD patients, who have been treated with antithyroid drugs (ATD) for 1 year, were re-measured by quantitative enzyme-linked immunosorbent assay (ELISA) (RSR Limited, United Kingdom) in our laboratory. Patients with the levels of TRAb 1.5 U/L were defined as persistent TRAb positive (ptrab+), and those with TRAb levels <1.5 U/L were defined as non-persistent TRAb positive (ptrab-), according to the suggestion by the manual protocol (The cutoff value of TRAb levels were also applied in the clinical diagnosis of GD) 3,4. All the 1,516 controls in the GWAS stage were individuals with neither GD nor family history of GD, and without any other autoimmune disorders. Control subjects were matched for sex with cases and were over 35 years. Since GD and other autoimmune thyroid diseases (AITD) have a preponderance in the young female population, this age criteria could reduce the number of controls who might develop GD later on. For excluding clinical or subclinical autoimmune thyroid disease (AITD), the levels of sensitive TSH (stsh) and TPOAb in control subjects were measured 3

using chemiluminescence immunoassay (CLIA) in our laboratory. Of the 1,832 healthy controls whose levels of sensitive TSH and TPOAb were measured, 257 individuals with the levels of TPOAb 5.61 U/ml, and 94 subjects with the levels of sensitive TSH 4.94 µu/ml or 0.35 µu/ml were excluded, the remaining 1,516 served as the control cohort in the GWAS stage. A total of 3,994 unrelated GD patients from the Chinese Han population were enrolled for replication study in stage 2. The criteria for diagnosis of GD were the same as mentioned above. Among the 3,994 patients, the TRAb levels of 2,129 patients were measured, who had been treated with ATD for 1 year. The sex-matched controls consisted of 3,510 unrelated individuals and were all older than 35 years (Supplementary Table 9). DNA Samples quality control DNA was extracted from the blood samples collected from GD patients and control individuals using a FUJIFILM QuickGene-610L system. The concentrations of DNA were quantified using Nanodrop8000 (Thermo). The concentrations of the DNA samples used in this study were 50 ng/µl and the A260/A280 ratios of the DNA samples were between 1.6 and 1.8. GWAS genotyping and initial quality control DNA samples from 1,536 GD cases and 1,516 controls were genotyped using Illumina Human660-Quad BeadChips at the Chinese National Human Genome Center in Shanghai, China. Genotype clustering was conducted with Illumina BeadStudio 3.3 software, which converted the fluorescence intensities into SNP genotypes. The mean call rate across all samples was 99.8%. Quality filtering was performed on SNPs and samples before analysis to ensure robust association tests. Cryptic relationships between genotyped individuals were examined using pairwise identity-by-descent (IBD) estimation using PLINK 5. Pairs showing 4

relatedness closer than first cousins (estimated proportional IBD > 0.125) were identified, and the sample with lower call rate in each pair was removed. Of the 657,366 markers assayed, the Y and mitochondrial SNPs, CNV-related markers, and Illumina controls were excluded, leaving 592,614 SNPs for further analysis. Next, 106,565 markers with Hardy-Weinberg equilibrium P 10-6, genotype call rates below 98%, or minor allele frequency (MAF) < 0.01 were discarded, leaving 486,049 SNPs for subsequent analysis (Supplementary Fig. 1a). After removing samples with low call rates (< 98%, n= 20), gender inconsistencies (n= 6), and cryptic relatedness (n= 68), 2,958 samples were available for further association analysis (Supplementary Fig. 1a). Evaluation of population structure Principal component analysis (PCA) and multidimensional scaling (MDS) analysis using a subset of pruned markers (95,086 SNPs) were performed by SmartPCA (one of the program modules in the EIGENSOFT software package) 6 and PLINK 5, respectively, to evaluate the population structure in samples. For the PCA, our samples and the HapMap samples were plotted using the first two eigenvectors produced by smartpca (one of the program modules in the EIGENSOFT software package) 6. For MDS analysis, dimensions were calculated based on the IBD pairwise distance among all individuals in both our cohort and the HapMap subjects using PLINK 5. The first two dimensions of the result were plotted. SmartPCA was also used to identify potential genetic outliers. The top ten principal components were computed using HapMap populations followed by projection of current cohort subjects onto those principal components, and then run the outlier removal (σ threshold = 6 with five iterations). Quantile-quantile plots The distribution of observed P values (on the -log 10 scale) of given SNPs were plotted against the theoretical distribution of expected P values to construct quantile-quantile plots. The genomic control inflation factor (λ) was calculated by dividing median χ 2 5

statistics by 0.4563. The calculations and the plots were done using PLINK 5 and R statistics packages. Association analysis For SNPs and samples passing GWAS quality control, we tested for association using the Cochran-Armitage trend test implemented in PLINK 5. Age-adjusted odds ratios were obtained by logistic regression analysis 5. Consistent with stage 1, we performed logistic regression to analyze the association between each SNP and GD risk using the Cochran-Armitage trend test in stage 2. Association analysis in the combined samples was carried out by Cochran-Mantel-Hanezel stratification analysis 5. The heterogeneity of odds ratios among the different cohorts of stage 1 and stage 2 studies were examined by using the Breslow-Day test 7. Conditional logistic regression analysis was performed to identify the SNPs independently associated with GD within each validated region. For the MHC region with multiple independently associated SNPs, forward stepwise logistic regression analysis was carried out in the GWAS sample. For the non-mhc regions, conditional logistic regression analysis was performed in the combined samples of the two stages. The model assumed an additive effect on the logistic scale at the locus of interest and was restricted to individuals with complete genotyping across the SNPs being analyzed. To test for heterogeneity in allele frequency between ptrab+ and ptrab- GD patients, we used the logistic regression analysis of the combined cases where subclinical phenotypes were used as the outcome variables. Interactions between SNPs of TSHR and MHC in ptrab+ or ptrab- GD patients were analyzed using the case-only epistasis test implemented in PLINK 5. Imputation analysis Genotype imputation was performed using a hidden Markov model algorithm 6

implemented in the software program MACH v.1.0.16 8 and the HapMap II CHB phased haplotypes 9 were used as a reference. As input for the imputation, only genotyped GWAS SNPs that passed stringent quality control were used. Of the imputed SNPs, we analyzed only those SNPs that could be imputed with a relatively high confidence (estimated r 2 between imputed SNP and true genotypes >0.3), had a minor allele frequency >1% in cases or controls as well as a HWE P value <0.01 in the control samples. To take imputation uncertainty into account, we used allelic dosage association as implemented in the program MACH2DAT 8. The allelic dosage is the weighted sum of the genotype class probabilities. Finally, the association analysis of the imputed SNPs was carried out utilizing the ProbABEL software 10. Selection of SNPs for replication study In the MHC region, five SNPs (rs4947296, rs1521, rs6903608, rs6457617 and rs2281388) showed independent association with GD or GD subgroups were selected for replication. In non-mhc regions, 126 SNPs showed P values less than 1 10-4, representing 38 independent chromosome loci. We selected SNPs in these 38 candidate regions for replication, including several reported disease SNPs in these regions. For example, the functionally well studied SNP rs3087243 (CT60) in CTLA4, which is not present in the GWAS SNP sets, is in perfect linkage disequilibrium (LD) (r 2 = 1 in the Hapmap CHB data) with rs231804 (P GWAS = 1.34 10-4 ), we therefore carried out genotyping of rs231804 in our replication study to make a general observation of rs3087243. In the 6q15 region, a cluster of SNPs showed association with P <1 10-4. SNP rs11755527 at 6q15 was previously reported to be associated with type 1 diabetes and Crohn s disease 11. Although it was not included on Illumina Chips, we genotyped it in the replication samples. Among these 38 loci, variants in three regions were not taken forward into replication because of difficulty in probe synthesis. Additionally, five SNPs from different potential candidate regions, although with P values between 1 10-4 and 1 10-3 were selected for replication. Totally, 95 SNPs representing 40 non-mhc candidate regions and 5 MHC SNPs were genotyped in 3,994 cases and 3,510 controls for replication study in the stage 2 (Supplementary 7

Table 1). The cluster plots were visually inspected for each selected SNP in BeadStudio. Genotyping and quality control in stage 2 study Among the 100 SNPs for replication in stage 2, 96 SNPs were genotyped using TaqMan SNP Genotyping Assays in Fludigm EP1 platform and four SNPs (rs4947296, rs1521, rs6903608and rs6457617) were genotyped by Applied Biosystems 7900HT Fast Real Time PCR System according to the manufacturer s protocol. In the replication study, the DNA concentrations of all samples were standardized to 50 ng/µl in the 96 well plate, and one negative control (DNase and RNase free water) was put in one of 96 wells at random. For quality control, ninety-one samples genotyped by Illumina Human660-Quad BeadChips were re-genotyped for 96 SNPs on Fludigm EP1 platform, to evaluate the concordance of the genotypes in the samples between the two platforms. Confirmation of previously reported GD loci We confirmed previously reported association of GD with the MHC, TSHR, CTLA4 and FCRL3 locus (Table 1 and Supplementary Table 3) 2,12-18. Our results that variants in the MHC I and II regions had independent associations with GD are in agreement with previous findings 13. Although association of the MHC II genes with GD has been attributed to HLA-DRB1-DQA1-DQB1 haplotype 12, we gave the first evidence that HLA-DP gene locus conferred an HLA-DRB1-DQA1-DQB1 independent influence on the risk of GD 19. Moreover, the strongest association in the MHC region was at the HLA-DP locus in the Chinese Han population. The most significant SNP outside the MHC region was rs12101261 (P combined = 6.64 10-24 ), which was located in intron 1 of the TSHR gene on chromosome 14q31 (Supplementary Fig. 5a). This SNP also exhibited strong association in the replication study (Table 1). 8

A cluster of SNPs in the CTLA4 region showed strong evidence for association with GD in the original scan (Supplementary Fig. 5b). Among four SNPs genotyped in the replication samples, three reached the genome-wide significance level. The best SNP, rs1024161 (P combined = 2.34 10-17 ), was located about 10-kb 5 upstream of CTLA4 (Table 1 and Supplementary Table 3). Although the functionally well studied SNP rs3087243 (CT60) was not present in the GWAS SNP sets, its P value in the imputation analysis was 3.9 10-4. And, rs231804 in perfect LD with rs3087243 in the Hapmap CHB data (r 2 = 1) showed significant association with GD in our combined population (P combined = 2.12 10-11 ). Rs231804 (r 2 = 0.44 with the top SNP rs1024161 in CTLA4 in our data) showed no significant association after conditioning on rs1024161 (P= 0.44), while rs1024161 showed significant association after conditioning on rs231804 (P= 4.63 10-8, Supplementary Table 5). Reports of association of FCRL3 locus with GD have been less consistent 17,18,20. The initial GWAS showed two SNPs, located in a block of strong LD containing six genes, reached genome-wide significance (Supplementary Table 3 and Supplementary Fig. 5c). Among six SNPs genotyped in the second cohort, SNP rs3761959 (P combined = 1.50 10-13 ), located within intron 3 of FCRL3, was most strongly associated with GD (Table 1 and Supplementary Table 3). Cis-eQTL analysis For the cis-eqtl analysis, we inspected the eqtl database developed by Dixon et al 21. This database contains 405 children of British descent, organized into 206 sibships, including 297 sib pairs and 11 half-sib pairs. The families were identified through a proband with childhood asthma, and siblings were included regardless of disease status. Global gene expression in lymphoblastoid cell lines (LCLs) was measured using Affymetrix HG-U133 Plus 2.0 chips. All 405 children and their parents were genotyped using the Illumina Sentrix Human-1 Genotyping BeadChip (ILMN100K, including 105,713 autosomal SNPs), and 378 children were also genotyped using Illumina Sentrix HumanHap300 BeadChip (ILMN300K, including 307,981 9

autosomal SNPs) according to the manufacturer's instructions 22. Non-genetic contributions in gene expression measures were estimated using PCA 23,24. Top principal components were included in the eqtl regression model as covariates. The number of principal components used was chosen to maximize the number of genome-wide significant cis-eqtls. Association analysis was applied with the FASTASSOC option implemented in MERLIN 25,26. Cis-eQTL analysis of 6q27 region There are 186 SNPs with P< 0.01 in a 200-kb region of association with GD on 6q27. From the cis-eqtl database developed by Dixon et al 21, we found that the expression of RNASET2 was strongly associated with a cluster of SNPs. These SNPs were strongly associated with GD and were in a LD block containing RNASET2, with the strongest association at rs9355610 (P= 9.44 10-13 for probeset 217983_s_at and P= 1.06 10-14 for probeset 217984_at) and rs9366078 (P= 1.40 10-12 for probeset 217983_s_at and P= 9.50 10-15 for probeset 217984_at, Supplementary Fig. 6b). Notably, rs9355610 is in strong LD with rs9366078 (r 2 > 0.95, D = 1). These two SNPs (rs9355610 and rs9366078) were the SNPs most significantly associated with RNASET2 expression (Supplementary Fig. 6b) and accounted for 21.7% of the residual variance of RNASET2 expression after adjusting for non-genetic effects. Although rs9355610 was not associated with FGFR1OP and CCR6 expression, a cluster of SNPs associated with GD susceptibility were correlated with FGFR1OP gene expression, with the top SNP for FGFR1OP being rs1060404 (P= 6.6 10-7, Supplementary Fig. 6b). However, CCR6 gene expression was not correlated to any SNP with P < 0.01 in the 200-kb GD risk region at 6q27 (Supplementary Fig. 6b). We also inspected a cis-eqtl database assessing the transcriptome of circulating monocytes from 1,490 individuals 27. Rs9355610 was not found in this database, since it was not included on the SNP Chip used in that study. However, rs9366076, in LD with rs9355610 (r 2 = 0.62) in our own data, was found to be correlated with RNASET2 expression (P= 8.69 10-12 ). All these e-qtl findings suggested that the 6q27 risk 10

SNPs might have a great influence on RNASET2 expression. Cloning of the novel gene at 4p14 A 110-kb interval between CHRNA9 and RHOH in 4p14 region was analyzed by inspection the UCSC Genome Browser (http://genome.ucsc.edu). Five hypothetical transcripts were predicted in this region using Genscan and Ensembl, and one (ENST00000510551) contained three predicted exons, supported by several split expressed sequence tags (ESTs; Supplementary Fig. 8a). Primers for 5 and 3 ends were designed based on the sequences of exon 1 and exon 3, respectively, in the ENST00000510551 predicted transcript. This transcript was confirmed by sequencing the product amplified from a human T cell lymphoblast-like cell line Jurkat using RT-PCR (primer sequences showed in Supplementary Table 11). The approximately 900-bp transcript fragment contains a predicted open reading frame (ORF) encoding 114 amino acids (named GDCG4p14.1). Next, 5 primer for RT-PCR was designed to anneal to the first exon of a predicted transcript (Chr4_20.52, Genscan Gene Predictions) upstream of the GDCG4p14.1 to amplify the 5 sequence of the novel gene (Fig. 3g and Supplementary Fig. 8a). Unexpectedly, an about 650-bp fragment was amplified using the pairs of primers consisting of a 5 primer on exon 1 of Chr4_20.52 and a 3 primer on exon 3 of GDCG4p14.1. Sequencing information revealed the fragment was assembled by exon 1 and exon 2 of Chr4_20.52 with the exon 2 and exon 3 of the transcript unit GDCG4p14.1 (skipping the exon 1 of the GDCG4p14.1, named GDCG4p14.2), encoding the same predicted ORF as GDCG4p14.1 (Fig. 3g, Supplementay Fig. 8b, c and Supplementary Table 11). No specific fragments located on the 5 upstream of these two novel transcripts were amplified by nested PCR using 5 primers on exons of a predicted transcript (Chr4_20.49, Genscan Gene Predictions) and EST BF515986, with 3 primer on exon 1 of GDCG4p14.1 and GDCG4p14.2, respectively 11

(Supplementary Fig. 8a and Supplementary Table 11). Moreover, no fragments were amplified using the primer pairs of 5 primers on exon 1 and exon 2 of GDCG4p14.2 and a 3 primer on exon 1 of GDCG4p14.1 (Supplementary Fig. 8a). These results suggested that GDCG4p14.1 and GDCG4p14.2 might be two isoforms caused by distinct transcriptional start sites. The hypothetical protein had no hits in the protein database by Blastp analysis, and no motifs were identified by searching the PROSITE database. The orthologs of human GDCG4p14 protein in Orangutan, Rhesus, Mus musculus, Rat, Bos taurus and Danio rerio were identified by searching the genome sequences using the reciprocal best hit search methods as described 28,29. The orthologs of the human GDCG4p14 were found only in primates, but not identified from mammalian, suggesting it is a relatively late appearing gene in evolution. The novel gene is conserved between human and primates (Supplementary Fig. 8d). Non-synonymous SNPs in LD with the leading SNPs in GD risk loci The LD of the leading SNPs in MHC region and five non-mhc regions with non-synonymous SNPs was inspected in the 1,000 genomes project database. Currently, no leading SNPs from these six regions were in high LD with non-synonymous SNPs in a range of 1 Mb of each locus (r 2 > 0.3), except for SNP rs4947296 in MHC region, which was in complete LD with three non-synonymous SNPs (rs2233985, rs2233984 and rs2233983) located in the C6orf15 (STG) gene. The STG gene, near to HLA-C, was associated with the susceptibility to follicular lymphoma and expressed in the human tonsil tissue, but STG function is unknown 30. 12

References 1. Song, H.D. et al. Functional SNPs in the SCGB3A2 promoter are associated with susceptibility to Graves' disease. Hum Mol Genet 18, 1156-70 (2009). 2. Zhao, S.X. et al. Association of the CTLA4 gene with Graves' disease in the Chinese Han population. PLoS One 5, e9821 (2010). 3. Carella, C. et al. Serum thyrotropin receptor antibodies concentrations in patients with Graves' disease before, at the end of methimazole treatment, and after drug withdrawal: evidence that the activity of thyrotropin receptor antibody and/or thyroid response modify during the observation period. Thyroid 16, 295-302 (2006). 4. Smith, B.R. et al. A new assay for thyrotropin receptor autoantibodies. Thyroid 14, 830-5 (2004). 5. Saxena, R. et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331-6 (2007). 6. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904-9 (2006). 7. Breslow, N.E. & Day, N.E. Statistical methods in cancer research. Volume I - The analysis of case-control studies. IARC Sci Publ, 5-338 (1980). 8. Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annu Rev Genomics Hum Genet 10, 387-406 (2009). 9. Frazer, K.A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851-61 (2007). 10. Aulchenko, Y., Struchalin, M. & van Duijn, C. ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics 11, 134 (2010). 11. Smyth, D.J. et al. Shared and distinct genetic variants in type 1 diabetes and celiac disease. N Engl J Med 359, 2767-77 (2008). 12. Simmonds, M.J. et al. Regression mapping of association between the human leukocyte antigen region and Graves disease. Am J Hum Genet 76, 157-63 (2005). 13. Simmonds, M.J. et al. A novel and major association of HLA-C in Graves' disease that eclipses the classical HLA-DRB1 effect. Hum Mol Genet 16, 2149-53 (2007). 14. Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506-11 (2003). 15. Brand, O.J. et al. Association of the thyroid stimulating hormone receptor gene (TSHR) with Graves' disease. Hum Mol Genet 18, 1704-13 (2009). 16. Hiratani, H. et al. Multiple SNPs in intron 7 of thyrotropin receptor are associated with Graves' disease. J Clin Endocrinol Metab 90, 2898-903 (2005). 17. Chistiakov, D.A. & Chistiakov, A.P. Is FCRL3 a new general autoimmunity gene? Hum Immunol 68, 375-83 (2007). 18. Kochi, Y. et al. A functional variant in FCRL3, encoding Fc receptor-like 3, is associated with rheumatoid arthritis and several autoimmunities. Nat Genet 37, 13

478-85 (2005). 19. Kamatani, Y. et al. A genome-wide association study identifies variants in the HLA-DP locus associated with chronic hepatitis B in Asians. Nat Genet 41, 591-5 (2009). 20. Burton, P.R. et al. Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants. Nat Genet 39, 1329-37 (2007). 21. Dixon, A.L. et al. A genome-wide association study of global gene expression. Nat Genet 39, 1202-1207 (2007). 22. Moffatt, M.F. et al. Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature 448, 470-3 (2007). 23. Leek, J.T. & Storey, J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3, 1724-35 (2007). 24. Stegle, O., Kannan, A., Durbin, R. & Winn, J. Accounting for Non-genetic Factors Improves the Power of eqtl Studies. 411-22 (2008). 25. Abecasis, G.R., Cherny, S.S., Cookson, W.O. & Cardon, L.R. Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nature Genetics 30, 97-101 (2002). 26. Chen, W.M. & Abecasis, G.R. Family-based association tests for genomewide association scans. American Journal of Human Genetics 81, 913-926 (2007). 27. Zeller, T. et al. Genetics and Beyond The Transcriptome of Human Monocytes and Disease Susceptibility. PLoS One 5, e10693 (2010). 28. Song, H.D. et al. Hematopoietic gene expression profile in zebrafish kidney marrow. Proc Natl Acad Sci U S A 101, 16240-5 (2004). 29. Barbazuk, W.B. et al. The syntenic relationship of the zebrafish and human genomes. Genome Res 10, 1351-8 (2000). 30. Skibola CF, e.a. Genetic variants at 6p21.33 are associated with susceptibility to follicular lymphoma. Nat Genet. 41, 873-5 (2009). 14

Supplementary Figures Supplementary Figure 1 The flowchart for quality filtering in the two-stage study for GWAS Supplementary Figure 2 Summary of association results from the genome-wide scan. Supplementary Figure 3 Plots of principal component analysis (PCA) and multidimensional scaling (MDS) analysis in our cohorts and the HapMap samples Supplementary Figure 4 Plots of the results from forward stepwise logistic regression analysis within MHC region in stage 1 study Supplementary Figure 5 Regional plots of TSHR, CTLA4 and FCRL3 loci associated with GD Supplementary Figure 6 Association of SNPs with GD and transcript abundances of three genes (RNASET2, FGFR1OP and CCR6) at 6q27 Supplementary Figure 7 LD map of 6q27 and the disease-associated SNPs Supplementary Figure 8 Schematic structure of the GDCG4p14 gene based on human reference sequence (GRCh37) Supplementary Figure 9 Expression profiles of selected positional candidate genes in various human tissues by real-time RT-PCR 15

Supplementary Tables Supplementary Table 1 Summary of non-mhc SNPs in 38 candidate regions with P< 1 10-4 in the GWAS data and five SNPs from potential candidate regions with 1 10-4 < P< 1 10-3 selected for replication Supplementary Table 2 The call rates of 100 SNPs in the combined cohorts for replication Supplementary Table 3 Association results of 100 SNPs with GD in the initial genome-wide scan and the replication analysis Supplementary Table 4 Forward stepwise logistic regression analysis of MHC SNPs from the GWAS data in different case-control groups Supplementary Table 5 Conditional logistic regression analysis for SNPs in the four regions associated with Graves disease in the combined cohorts Supplementary Table 6 Association results of the imputed and typed SNPs in 6q27 region with GD in initial genome-wide scan Supplementary Table 7 Conditional logistic regression analysis for previously reported disease-associated SNPs and rs9355610 at 6q27 associated with GD in initial genome-wide scan data Supplementary Table 8 Association results of the imputed and typed SNPs in 4p14 region with GD in initial genome-wide scan Supplementary Table 9 Description of the sample sets in the current study Supplementary Table 10 Concordance of the genotypes in 91 samples analyzed by Illumina Human660-Quad 16

BeadChips and Fluidigm EP1-TaqMan SNP Genotyping Assays Supplementary Table 11 Primers used for quantitative real-time PCR assays and cloning novel gene at 4p14 17

Supplementary Figure 1. The flowchart for quality filtering in the two-stage study for GWAS. In stage 1, quality filtering was performed on SNPs and samples before analysis to ensure robust association tests. Of the 657,366 markers assayed, the Y and mitochondrial SNPs, CNV related markers and Illumina controls were excluded, leaving 592,614 SNPs for further analysis (a). By further analysis, we excluded SNPs showing a high missing call rate, a minor allele frequency in the population or significant deviation from Hardy-Weinberg equilibrium in the controls (P 10 6 ). Samples were excluded if they have a low call rate (< 98%, n = 20), gender inconsistencies (n = 6), and cryptic relatedness (n = 68) (a). In the replication study, of the 100 SNPs, one SNP with the call rate less than 95% was removed, and the 246 samples were excluded for further analysis due to the call rate less than 95% (b). a Stage 1 study b Stage 2 study 657,366 SNPs 100 SNPs Missing Call rate> 2% MAF< 1% HWE:p 10 6 (Excluded :106,565 SNPs) Located on Y chromosomes, mitochondrial or CNVs and Illumina controls (Excluded d :64,752 SNPs) Call rate < 95% HWE: p 10-3 (Excluded: d 1 SNP) 1,536 GD patients 1,516 controls 3.994 GD patients 3,510 controls Call rate < 98% gender inconsistencies cryptic relatedness (Excluded:94 samples) Call rate < 95% (Excluded: 246 samples) 486,049 SNPs 1,468 GD patients 1,490 controls 99 SNPs 3,832GD patients 3,426 controls

Supplementary Figure 2. Summary of association results from the genome-wide scan. (a-c) Genome-wide association results of GD from the initial scan. Plot shows the genome-wide P values (-log 10 P values) of the Cochran-Armitage trend test from 486,049 polymorphic SNPs in 1468 GD cases (a), 997 ptrab+ GD patients (b) and 410 ptrab- GD patients (c) compared with 1,490 control samples. Each chromosome is depicted as a different color. The red horizontal line represents P=1 10-5. (d-f) Quantile-quantile plot of GWAS results for all SNPs (in red) and after exclusion of SNPs in the MHC region (in blue). Quantile-quantile plot for test statistics in 1,468 GD cases (d), 997 ptrab+ GD patients (e) and 410 ptrab- GD patients (f) compared with 1,490 control samples.

Supplementary Figure 3. Plots of principal component analysis (PCA) and multidimensional scaling (MDS) analysis in our cohorts and the HapMap samples. (a) Case-control cohorts (1,468 GD cases and 1,490 controls) and 270 individuals from HapMap data were plotted using the first two eigenvectors produced by the EIGENSOFT software package. (b) The case-control cohorts, and the 270 HapMap samples were plotted by their first and second dimension values. MDS analysis was done using the PLINK1 (http://pngu.mgh.harvard.edu/purcell/plink/) and R statistics packages (http://r-project.org). The identity-by-descent (IBD) pairwise distances among all the case-control cohorts and HapMap groups were used to construct dimensions. In both (a) and (b), individuals who were GD cases, controls, Chinese (CHB), Japanese (JPT), Caucasian (CEU), and Yoruban (YRI) groups were plotted in red, black, green, blue, dark red and bright red, respectively. The current cohorts clearly cluster together with CHB and JPT components of the HapMap. The pictures in the boxes on panel (a) and (b) are magnified in (e) and (f), respectively. (c) The current cohorts were plotted based on the first two eigenvectors obtained by PCA analysis. (d) The current cohorts were plotted by the first and second dimension values obtained by MDS analysis. No significant structures were observed in our current cohorts.

Supplementary Figure 4 Plots of the results from forward stepwise logistic regression analysis within MHC region in stage 1 study. (a) Analysis of the 252 SNPs showed genome-wide significant association (P 5.0 10-8 ) in the comparison of 1,468 GD cases vs. 1,490 control subjects. The logistic regression analyses were restricted to individuals with complete genotyping across all the 252 SNPs. The uncontrolled or original P values were presented in red dots and the P values of other SNPs conditioning on rs2281388, rs4947296, rs6903608 and rs6457617 were presented in blue dots. After conditioning on these four SNPs, the P values of other SNPs in MHC region were all below genome-wide significance. (b) Analysis of the 268 SNPs showed genome-wide significant association (P 5.0 10-8 ) in the comparison of 997 ptrab+ GD patients vs. 1,490 control subjects. The forward stepwise logistic regression analyses were restricted to individuals with complete genotyping across all 268 SNPs. The uncontrolled or original P values were presented in red dots and the P values conditioning on rs2281388, rs1521 and rs4947296 were presented in blue dots. After conditioning on the three SNPs, the P values of other SNPs were all below genome-wide significance.

Supplementary Figure 5 Regional plots of TSHR, CTLA4 and FCRL3 loci associated with GD. P values were calculated by comparing all GD cases with controls. Each diamond represents one SNP, and the most strongly associated SNP is indicated by a large red diamond. Blue diamond represents P values in the combined analysis of GWAS and replication data. The color of each SNP spot reflects its r 2 with the top SNP (large red diamond) within each association locus, changing from red to white. Estimated recombination rates (based on the combined CHB and JPT samples from the HapMap project) were plotted in cyan to reflect the local LD structure around the associated SNPs. Gene annotations were adapted from the University of California at Santa Cruz Genome Browser (http://genome.ucsc.edu/). (a) TSHR region on 14q31. (b) CTLA4 region on 2q33. (c) FCRL3 region on 1q21-22.

Supplementary Figure 6 Association of SNPs with GD and transcript abundances of three genes (RNASET2, FGFR1OP and CCR6) at 6q27. (a) Association results for SNPs with GD at 6q27. The red lines indicate P values of SNPs genotyped by Illumina Human660-Quad BeadChips in our stage 1 study, while blue lines indicate P values of SNPs imputed based on HapMap II CHB phased haplotypes. (b) Plot of linkage disequilibrium (LD) structures at 6q27 and correlation of SNPs to transcript abundances of three genes (RNASET2-FGFR1OP-CCR6) at 6q27. The LD structures were analyzed by Haploview software version 4.2 based on the data of CHB+JPT population from HapMap phase II release 22. The LD color scheme is stratified according to the logarithm of the odds (LOD) score and D': LOD <2 (white for D'<1 and blue for D' = 1) or LOD >2 (shades of pink/red for D'<1 and bright red for D' = 1). Red dots and triangles indicate the association results of SNPs to the expression level of RNASET2. Blue dots and triangles indicate the association results of SNPs to the expression level of FGR1OP. The dots and triangles indicate the expression levels of the gene were detected by different probe sets. CCR6 expression level was not associated with any SNP at 6q27 with P < 0.01 in our GWAS data.

Supplementary Figure 7 LD map of 6q27 and the disease-associated SNPs (a-c) LD structure for the region 167,200-167,510 Kb at 6q27 in the CHB (a), CEU (b) and YRI (c) population from the HapMap phase III release 2. Coloring in the figure is according to r 2. The disease SNPs were indicated by green lines, which include Crohn s disease associated SNP rs2301436 (P GWAS = 5.88 10-7 ), rheumatoid arthritis associated SNP rs3093023 (P GWAS = 2.63 10-3 ), and vitiligo associated SNP rs2236313 (P GWAS = 3.02 10-5 ) and rs6902119 (P GWAS = 1.33 10-5 ). (d) LD plots of rs9355610 and the five previously reported autoimmune SNPs. The r 2 value is estimated by the genotype data of GD cases and controls enrolled in the GWAS. We construct the plots using Haploview software version 4.2, and r 2 ( 100) values are depicted in the diamonds.

d rs2236313 rs9355610 rs2301436 rs6902119 rs3093024 rs3093023 1 2 3 4 5 6 63 70 92 66 99 80 64 64 66 73 42 64 50 42 50

Supplementary Figure 8 Schematic structure of the GDCG4p14 gene based on human reference sequence (GRCh37). (a) Inspection of the UCSC Genome Browser (http://genome.ucsc.edu) revealed five hypothetical genes and several expressed sequence tags (ESTs) within the 110-kb interval between the CHRNA9 and RHOH at 4p14. The exon organization of GDCG4p14 and the putative ORF were shown in the region of grey background and exons were marked by red rectangle. GDCG4p14 has two isoforms: the cdna of GDCG4p14.1 containing three exons and the cdna of GDCG4p14.2 containing four exons. Human ESTs were shown in black. The transcripts chr4_20.48, chr4_20.49, chr4_20.52 and chr4_20.53 were predicted by Genscan and ENST00000510551 were predicted by Ensemble. (b, c) Graphical overview on six reading frames of the cdna sequences of GDCG4p14.1 and GDCG4p14.2. The start codon was indicated by ticks above the horizontal line and stop codon was indicated by ticks under the horizontal line. The same putative open reading frame (ORF) encoding 114 amino acids of GDCG4p14.1 and GDCG4p14.2 were shown in green. (d) Comparison of the orthologs of GDCG4p14 in primates and human. The orthologs of human GDCG4p14 protein in Orangutan, Rhesus, Mus musculus, Rat, Bos taurus and Danio rerio were identified by searching the genome sequences using the reciprocal best hit search methods. No orthologs of the human GDCG4p14 were found in mammalian, but they were found in primates. * Amino acids are identities among all species. : Amino acids are positives among all species.

Supplementary Figure 9 Expression profiles of selected positional candidate genes in various human tissues by real-time RT-PCR. The mrna level of GDCG4p14 in Jurkat cell was defined as one, and those of the candidate genes in all tissues were presented as a fraction of this. (a) RNASET2 gene was widely expressed in most of human tissues/cells, especially ill in thyroid and immune-related tissues (such as lymph node, PBMC, thymus and spleen tissues). (b) FGFR1OP was expressed in human testis and spleen tissues at high level, and in most of other tissues/cells at relative low level. (c) Expression of GDCG4p14 was detected at low levels in most of human tissues/cells, but was at relatively high levels in human T cell lines (Jurkat cell). a 1.2 1.0 0.8 0.6 0.4 02 0.2 0.0 RNASET2 b 1.2 1.0 0.8 0.6 0.4 0.2 0.0 FGFR1OP c 1.2 1.0 0.8 0.6 0.4 0.2 0.0 GDCG4p14

Supplementary Table 1. Summary of Non-MHC SNPs in 38 candidate regions with P< 1 10-4 in the GWAS data and five SNPs from potential candidate regions with 1 10-4 <P <1 10-3 selected for replication Chr. locus Chr. SNP Chr. Position Annotated Gene Reference allele Control RAF Case RAF P value OR 95% CI 1 1 rs10908583 155908307 FCRL4/FCRL3 T 0.44 0.50 1.47 10-5 1.26 1 rs2210911 155910491 FCRL4/FCRL3 G 0.37 0.42 2.54 10-5 1.26 1 rs7522061 155935014 FCRL3 C 0.38 0.45 1.65 10-8 1.35 1 rs3761959 155935902 FCRL3 A 0.38 0.45 2.22 10-8 1.35 1 rs7517644 155983652 FCRL2 G 0.13 0.17 2.35 10-7 1.46 1 rs12743184 156020983 FCRL2/FCRL1 G 0.13 0.18 1.16 10-7 1.47 1 rs2050568 156036865 FCRL1 T 0.36 0.43 3.70 10-8 1.34 1 rs6689427 156047516 FCRL1 G 0.32 0.40 1.13 10-8 1.37 1 rs2765493 156064624 FCRL1/CD5L G 0.36 0.44 1.39 10-8 1.36 1 rs2765502 156071731 CD5L G 0.13 0.18 1.66 10-6 1.41 1 rs2260040 156078016 CD5L G 0.13 0.18 1.39 10-6 1.42 2 1 rs4233131 185960354 PLA2G4A/FAM5C A 0.26 0.31 2.94 10-5 1.27 3 2 rs2257197 6095998 SOX11/CMPK2 G 0.19 0.24 5.83 10-5 1.29 4 2 rs1013864 6968622 RSAD2/RNF144A C 0.30 0.35 1.21 10-5 1.27 2 rs3806609 6973634 RSAD2/RNF144A A 0.27 0.32 2.00 10-5 1.27 2 rs771274 6978080 RSAD2/RNF144A A 0.39 0.44 9.82 10-6 1.27 2 rs771283 6980852 RSAD2/RNF144A A 0.27 0.32 3.71 10-5 1.27 5 2 rs848551 36554143 CRIM1 T 0.19 0.23 6.64 10-5 1.29 6 2 rs6753127 46450800 EPAS1 C 0.89 0.92 4.80 10-5 1.45 1.14-1.39 1.13-1.39 1.22-1.50 1.22-1.50 1.26-1.69 1.28-1.70 1.21-1.49 1.23-1.52 1.22-1.50 1.23-1.63 1.23-1.64 1.14-1.43 1.14-1.46 1.14-1.42 1.14-1.42 1.14-1.40 1.13-1.42 1.14-1.46 1.21-1.73 7 2 rs231804 204416891 CD28/CTLA4 T 0.82 0.86 1.34 10-4 1.32 1.14-1.51 2 rs1024161 204429997 CD28/CTLA4 T 0.67 0.73 3.81 10-8 1.37 2 rs926169 204430997 CD28/CTLA4 T 0.66 0.73 8.48 10-8 1.35 2 rs231726 204449111 CTLA4/ICOS T 0.63 0.68 9.16 10-6 1.27 2 rs10197319 204471289 CTLA4/ICOS G 0.73 0.78 8.76 10-5 1.27 2 rs3096851 204472127 CTLA4/ICOS C 0.62 0.67 3.69 10-5 1.25 2 rs3116504 204477299 CTLA4/ICOS G 0.62 0.67 3.57 10-5 1.25 8 2 rs2371438 212673776 ERBB4 A 0.91 0.94 2.52 10-5 1.51 9 3 rs1456078 36490720 STAC G 0.44 0.49 2.58 10-5 1.25 3 rs9881075 36492939 STAC C 0.43 0.48 6.39 10-5 1.23 3 rs17035210 36528606 STAC T 0.64 0.70 1.34 10-5 1.27 1.22-1.53 1.21-1.51 1.14-1.42 1.12-1.43 1.13-1.39 1.13-1.39 1.24-1.84 1.12-1.38 1.11-1.37 1.14-1.42 10 3 rs333284 125894843 KALRN A 0.19 0.23 1.47 10-4 1.28 1.12-1.45 11 4 rs6832151 39998408 RHOH/CHRNA9 G 0.34 0.40 2.99 10-6 1.28 12 4 rs1453474 72642488 SLC4A4 G 0.22 0.26 3.25 10-5 1.29 4 rs1377528 72644130 SLC4A4 A 0.20 0.24 7.46 10-5 1.28 4 rs1453458 72644727 SLC4A4 G 0.22 0.26 4.18 10-5 1.28 4 rs1563091 72646945 SLC4A4 A 0.22 0.26 3.66 10-5 1.29 13 4 rs1429638 74956794 CXCL1/PF4 A 0.28 0.33 5.68 10-5 1.26 1.16-1.43 1.14-1.45 1.14-1.45 1.14-1.45 1.14-1.45 1.12-1.40 14 4 rs1052325 82103746 C4orf22 A 0.94 0.96 1.05 10-4 1.59 1.25-2.02 15 6 rs2474618 90933565 BACH2/MAP3K7 C 0.53 0.58 6.81 10-5 1.24 6 rs2474619 90936756 BACH2/MAP3K7 A 0.63 0.68 1.07 10-5 1.28 6 rs2501720 90956282 BACH2/MAP3K7 A 0.53 0.58 9.81 10-5 1.23 6 rs12209546 90969315 BACH2/MAP3K7 T 0.53 0.59 2.87 10-5 1.25 6 rs370409 90978461 BACH2/MAP3K7 T 0.63 0.68 1.08 10-5 1.28 6 rs206916 90984449 BACH2/MAP3K7 G 0.54 0.59 3.78 10-5 1.24 6 rs9344996 90986022 BACH2/MAP3K7 C 0.35 0.40 2.70 10-5 1.25 6 rs206913 90993615 BACH2/MAP3K7 T 0.56 0.61 6.14 10-5 1.24 6 rs2655387 91004719 BACH2/MAP3K7 G 0.63 0.68 2.01 10-5 1.27 1.12-1.37 1.15-1.42 1.11-1.36 1.13-1.38 1.15-1.42 1.12-1.38 1.13-1.39 1.12-1.38 1.14-1.41

Continued Supplementary Table 1 15 6 rs3757247 91014184 BACH2/MAP3K7 A 0.48 0.52 2.67 10-4 1.21 1.1-1.34 6 rs661713 91032720 BACH2/MAP3K7 T 0.50 0.55 6.54 10-5 1.23 1.11-1.37 6 rs953233 91052538 BACH2/MAP3K7 A 0.35 0.40 4.34 10-5 1.25 1.12-1.38 6 rs10498965 91056323 BACH2/MAP3K7 G 0.35 0.40 4.84 10-5 1.24 1.12-1.38 6 rs597325 91059215 BACH2/MAP3K7 G 0.48 0.53 3.34 10-4 1.2 1.09-1.33 16 6 rs2205748 104569248 GRIK2/HACE1 C 0.59 0.64 6.44 10-5 1.24 1.11-1.37 17 6 rs11153415 113751835 RFPL4B/MARCKS A 0.21 0.26 1.19 10-4 1.27 1.12-1.43 18 6 rs3777722 167272094 RNASET2 C 0.60 0.66 5.37 10-6 1.28 1.15-1.42 6 rs3777723 167273691 RNASET2 G 0.53 0.59 1.64 10-5 1.25 1.13-1.39 6 rs2236312 167280094 RNASET2 G 0.59 0.64 2.36 10-5 1.26 1.13-1.40 6 rs2236313 167280379 RNASET2 T 0.39 0.44 3.02 10-5 1.25 1.13-1.39 6 rs1079145 167280714 RNASET2 C 0.59 0.65 2.60 10-5 1.26 1.13-1.39 6 rs9366076 167293698 RNASET2/FGFR1OP C 0.59 0.65 1.25 10-6 1.3 1.17-1.45 6 rs9355610 167303065 RNASET2/FGFR1OP G 0.47 0.54 1.31 10-7 1.32 1.19-1.46 6 rs429083 167303962 RNASET2/FGFR1OP G 0.39 0.44 3.64 10-5 1.24 1.12-1.38 6 rs9366078 167319502 RNASET2/FGFR1OP A 0.47 0.54 1.31 10-7 1.32 1.19-1.46 6 rs933243 167323863 RNASET2/FGFR1OP G 0.47 0.54 1.50 10-7 1.32 1.19-1.46 6 rs400837 167330998 RNASET2/FGFR1OP C 0.39 0.44 3.64 10-5 1.24 1.12-1.38 6 rs12526548 167351137 FGFR1OP T 0.39 0.45 5.98 10-7 1.31 1.18-1.45 6 rs2301436 167357978 FGFR1OP A 0.39 0.45 5.88 10-7 1.31 1.18-1.45 6 rs162295 167374087 FGFR1OP/CCR6 T 0.58 0.64 1.13 10-6 1.3 1.17-1.44 6 rs162297 167375129 FGFR1OP/CCR6 T 0.55 0.61 3.75 10-5 1.25 1.12-1.38 6 rs12529876 167381491 FGFR1OP/CCR6 A 0.39 0.45 1.04 10-6 1.3 1.17-1.44 6 rs10484530 167381552 FGFR1OP/CCR6 A 0.30 0.35 1.26 10-4 1.24 1.11-1.38 6 rs6921588 167414387 FGFR1OP/CCR6 A 0.46 0.53 1.38 10-6 1.29 1.17-1.43 6 rs204295 167420552 FGFR1OP/CCR6 C 0.46 0.53 7.79 10-7 1.3 1.17-1.44 6 rs1331301 167422628 FGFR1OP/CCR6 T 0.58 0.64 4.61 10-6 1.28 1.15-1.42 6 rs6902119 167425781 FGFR1OP/CCR6 C 0.38 0.44 1.33 10-5 1.26 1.14-1.40 6 rs150110 167431756 FGFR1OP/CCR6 T 0.59 0.64 8.28 10-5 1.24 1.11-1.37 19 7 rs17740440 23321766 IGF2BP3 C 0.96 0.98 5.69 10-5 1.97 1.41-2.74 20 7 rs10252228 34906564 NPSR1/DPY19L1 A 0.81 0.85 8.13 10-5 1.32 1.15-1.51 21 7 rs981490 79433765 Intergenic C 0.82 0.85 2.49 10-4 1.3 1.13-1.49 22 7 rs17168135 134302569 CALD1 G 0.81 0.85 1.65 10-5 1.35 1.18-1.55 23 7 rs12669418 146332413 CNTNAP2 G 0.76 0.80 8.32 10-5 1.28 1.13-1.45 24 8 rs2088267 106972695 ZFPM2/OXR1 C 0.74 0.78 8.21 10-5 1.27 1.13-1.44 25 8 rs16884651 114509354 CSMD3/TRPS1 G 0.49 0.54 7.63 10-5 1.23 1.11-1.36 26 8 rs7005834 134283386 WISP1 C 0.83 0.87 6.36 10-5 1.34 1.16-1.55 27 8 rs728827 143191447 FLJ43860/TSNARE1 G 0.85 0.88 5.61 10-5 1.36 1.17-1.58 28 9 rs8176749 135121009 ABO G 0.75 0.79 1.05 10-3 1.23 1.09-1.38 9 rs8176746 135121143 ABO C 0.75 0.79 9.67 10-4 1.23 1.09-1.39 9 rs657152 135129086 ABO G 0.53 0.58 8.32 10-5 1.23 1.11-1.37 9 rs505922 135139050 ABO T 0.52 0.58 6.40 10-6 1.27 1.15-1.41 9 rs630014 135139543 ABO T 0.34 0.39 8.45 10-6 1.27 1.14-1.42 29 10 rs1219525 56412019 PCDH15/ZWINT A 0.80 0.84 8.58 10-5 1.31 1.14-1.49 30 10 rs4147233 63652674 RTKN2 A 0.06 0.08 6.49 10-5 1.51 1.23-1.85 10 rs3910172 63737514 RTKN2/ZNF365 G 0.05 0.07 9.12 10-5 1.53 1.23-1.89 10 rs3864806 63739041 RTKN2/ZNF365 A 0.05 0.08 2.64 10-5 1.56 1.27-1.91 31 12 rs12368910 7491941 CD163L1/CD163 T 0.33 0.38 2.66 10-5 1.26 1.13-1.40 32 12 rs10861159 103030448 HCFC2/NFYB A 0.37 0.42 1.72 10-4 1.22 1.1-1.36 33 12 rs6489020 124866956 TMEM132B/TMEM132C T 0.60 0.65 4.30 10-4 1.21 1.09-1.34 12 rs10846982 124869291 TMEM132B/TMEM132C A 0.60 0.64 4.75 10-4 1.2 1.08-1.34 12 rs2345779 124883777 TMEM132B/TMEM132C T 0.60 0.64 8.64 10-4 1.19 1.07-1.33

Continued Supplementary Table 1 33 12 rs1859943 124892521 TMEM132B/TMEM132C G 0.61 0.65 8.73 10-4 1.19 1.08-1.33 12 rs729632 124906253 TMEM132B/TMEM132C G 0.45 0.49 1.26 10-3 1.18 1.07-1.31 12 rs733361 124911705 TMEM132B/TMEM132C G 0.58 0.64 6.96 10-7 1.3 12 rs929423 124912967 TMEM132B/TMEM132C C 0.58 0.64 5.42 10-7 1.3 1.17-1.44 1.17-1.45 34 14 rs879027 76562644 C14orf4/KIAA1737 G 0.38 0.43 9.51 10-4 1.19 1.07-1.32 14 rs10873301 76576958 C14orf4/KIAA1737 G 0.43 0.50 4.42 10-7 1.3 14 rs4021419 76589446 C14orf4/KIAA1737 C 0.42 0.48 2.04 10-6 1.28 14 rs8022640 76594340 C14orf4/KIAA1737 G 0.49 0.54 5.36 10-5 1.24 14 rs4903539 76602441 C14orf4/KIAA1737 T 0.46 0.52 5.50 10-5 1.23 1.17-1.44 1.15-1.42 1.12-1.37 1.12-1.37 14 rs3813543 76651234 C14orf4/KIAA1737 G 0.44 0.49 5.78 10-4 1.2 1.08-1.33 14 rs888066 76659936 C14orf4/KIAA1737 C 0.44 0.49 1.14 10-3 1.18 1.07-1.31 35 14 rs162171 80330130 C14orf145 T 0.23 0.29 4.55 10-6 1.31 14 rs327434 80340824 C14orf145 C 0.23 0.29 2.70 10-6 1.32 14 rs162174 80364009 C14orf145 C 0.23 0.29 5.08 10-6 1.31 14 rs7158936 80385545 C14orf145 A 0.23 0.29 2.65 10-6 1.32 14 rs2556611 80415919 C14orf145 A 0.27 0.32 9.81 10-6 1.29 14 rs12050151 80438570 C14orf145 C 0.19 0.25 2.51 10-8 1.42 14 rs2217177 80484907 C14orf145/TSHR C 0.26 0.33 8.43 10-9 1.39 14 rs2371462 80490527 C14orf145/TSHR C 0.69 0.74 7.70 10-6 1.29 14 rs8022600 80491176 C14orf145/TSHR T 0.26 0.33 4.80 10-9 1.4 14 rs179247 80502299 TSHR A 0.67 0.73 1.73 10-6 1.31 14 rs179249 80504952 TSHR C 0.24 0.31 2.05 10-10 1.45 14 rs2284720 80512920 TSHR G 0.24 0.31 1.63 10-10 1.45 14 rs2284722 80514120 TSHR A 0.25 0.32 1.15 10-10 1.45 14 rs3783949 80518135 TSHR C 0.66 0.72 3.45 10-7 1.34 14 rs12101261 80520982 TSHR T 0.65 0.71 6.70 10-7 1.32 14 rs10145099 80526447 TSHR C 0.67 0.73 6.49 10-7 1.33 14 rs17545038 80527325 TSHR C 0.23 0.31 1.56 10-10 1.46 14 rs4903964 80538707 TSHR A 0.61 0.66 3.50 10-5 1.25 14 rs3783943 80541199 TSHR T 0.62 0.68 6.96 10-6 1.28 14 rs2300525 80567146 TSHR C 0.26 0.31 1.95 10-5 1.28 14 rs17111394 80592881 TSHR C 0.21 0.25 1.85 10-5 1.3 14 rs2268474 80594160 TSHR C 0.26 0.31 2.04 10-5 1.28 36 15 rs1472631 61210792 LACTB/RPS27L T 0.80 0.85 2.80 10-5 1.33 15 rs12899931 61222249 LACTB/RPS27L C 0.45 0.50 6.79 10-5 1.23 15 rs2729812 61223681 LACTB/RPS27L C 0.79 0.83 6.72 10-5 1.3 15 rs953978 61232197 LACTB/RPS27L T 0.48 0.53 9.22 10-5 1.22 37 15 rs2053424 64777286 LCTL/SMAD6 A 0.57 0.63 1.50 10-5 1.26 1.17-1.48 1.18-1.49 1.17-1.48 1.18-1.49 1.15-1.44 1.26-1.61 1.24-1.56 1.16-1.45 1.25-1.56 1.17-1.47 1.29-1.63 1.3-1.63 1.3-1.63 1.2-1.49 1.18-1.47 1.19-1.49 1.3-1.64 1.12-1.39 1.15-1.42 1.14-1.43 1.16-1.47 1.14-1.43 1.16-1.52 1.11-1.36 1.14-1.48 1.1-1.35 1.13-1.39 38 16 rs7184784 87697672 ACSF3 C 0.45 0.50 1.04 10-4 1.22 1.11-1.36 16 rs4530136 87729306 ACSF3 A 0.41 0.47 1.39 10-6 1.29 39 18 rs605314 65081241 CCDC102B/DOK6 C 0.28 0.33 2.16 10-5 1.27 40 19 rs12610223 6759656 VAV1 G 0.16 0.20 3.67 10-5 1.33 41 19 rs10425613 59681048 CDC42EP5/LAIR2 C 0.05 0.08 5.12 10-5 1.54 42 20 rs6033777 13555380 TASP1 T 0.27 0.32 7.35 10-5 1.26 20 rs2423729 13566382 TASP1/ESF1 G 0.27 0.32 5.31 10-5 1.26 43 20 rs16981330 19863251 RIN2 G 0.75 0.79 8.25 10-5 1.27 20 rs16981333 19864144 RIN2 T 0.74 0.79 9.31 10-5 1.27 1.16-1.43 1.14-1.42 1.16-1.52 1.25-1.90 1.12-1.41 1.13-1.41 1.13-1.44 1.13-1.43 Note: RAF, reference allele frequency; SNPs marked with blue failed in probe synthesis for replication and those with red were from five potential candidate regions with 1 10-4 < P< 1 10-3 selected for replication.