Supplementary Methods

Similar documents
During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin,

Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels.

Supplementary Figures

Supplementary Online Content

CS2220 Introduction to Computational Biology

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

SUPPLEMENTARY DATA. 1. Characteristics of individual studies

Tutorial on Genome-Wide Association Studies

New Enhancements: GWAS Workflows with SVS

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

Assessing Accuracy of Genotype Imputation in American Indians

A total of 2,822 Mexican dyslipidemic cases and controls were recruited at INCMNSZ in

Title:Validation study of candidate single nucleotide polymorphisms associated with left ventricular hypertrophy in the Korean population

Know Your Number Aggregate Report Single Analysis Compared to National Averages

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Introduction to the Genetics of Complex Disease

Know Your Number Aggregate Report Comparison Analysis Between Baseline & Follow-up

Ct=28.4 WAT 92.6% Hepatic CE (mg/g) P=3.6x10-08 Plasma Cholesterol (mg/dl)

Global variation in copy number in the human genome

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

Human population sub-structure and genetic association studies

Introduction to Genetics and Genomics

CONTENT SUPPLEMENTARY FIGURE E. INSTRUMENTAL VARIABLE ANALYSIS USING DESEASONALISED PLASMA 25-HYDROXYVITAMIN D. 7

LTA Analysis of HapMap Genotype Data

Big Data Training for Translational Omics Research. Session 1, Day 3, Liu. Case Study #2. PLOS Genetics DOI: /journal.pgen.

Nature Genetics: doi: /ng Supplementary Figure 1

Chapter 4 INSIG2 Polymorphism and BMI in Indian Population

University of Groningen. Metabolic risk in people with psychotic disorders Bruins, Jojanneke

ESM1 for Glucose, blood pressure and cholesterol levels and their relationships to clinical outcomes in type 2 diabetes: a retrospective cohort study

Supplementary information for: A functional variation in BRAP confers risk of myocardial infarction in Asian populations

A loss-of-function variant in CETP and risk of CVD in Chinese adults

iplex genotyping IDH1 and IDH2 assays utilized the following primer sets (forward and reverse primers along with extension primers).

ASSOCIATION OF KCNJ1 VARIATION WITH CHANGE IN FASTING GLUCOSE AND NEW ONSET DIABETES DURING HCTZ TREATMENT

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

Dan Koller, Ph.D. Medical and Molecular Genetics

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

Mendelian Randomization

White Paper Guidelines on Vetting Genetic Associations

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits?

Genetics and Genomics in Medicine Chapter 8 Questions

Quality Control Analysis of Add Health GWAS Data

Supplementary Online Content

Modelling Reduction of Coronary Heart Disease Risk among people with Diabetes

The genetic architecture of type 2 diabetes appears

Association between interleukin-17a polymorphism and coronary artery disease susceptibility in the Chinese Han population

Value and challenges of combining large cohorts

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder

Heritability and genetic correlations explained by common SNPs for MetS traits. Shashaank Vattikuti, Juen Guo and Carson Chow LBM/NIDDK

Using ancestry estimates as tools to better understand group or individual differences in disease risk or disease outcomes

For more information about how to cite these materials visit

Letter to the Editor. Association of TCF7L2 and GCG Gene Variants with Insulin Secretion, Insulin Resistance, and Obesity in New-onset Diabetes *

Association Between F9 Malmö, Factor IX And Deep Vein Thrombosis

Variation in PNPLA3 is associated with outcomes. in alcoholic liver disease

Title: Pinpointing resilience in Bipolar Disorder

Appendix This appendix was part of the submitted manuscript and has been peer reviewed. It is posted as supplied by the authors.

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017

Table S1. Characteristics associated with frequency of nut consumption (full entire sample; Nn=4,416).

Genomic structural variation

UNIVERSITI TEKNOLOGI MARA COPY NUMBER VARIATIONS OF ORANG ASLI (NEGRITO) FROM PENINSULAR MALAYSIA

Supplementary Online Content

Imaging Genetics: Heritability, Linkage & Association

Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder)

Supplementary Online Content

Supplementary Note Details of the patient populations studied Strengths and weakness of the study

Dementia. Inhibition of vascular calcification

Total risk management of Cardiovascular diseases Nobuhiro Yamada

Relation between the angiotensinogen (AGT) M235T gene polymorphism and blood pressure in a large, homogeneous study population

GENOME-WIDE ASSOCIATION STUDIES

Genome-wide association studies (case/control and family-based) Heather J. Cordell, Institute of Genetic Medicine Newcastle University, UK

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

Supplementary Table 1. Association of rs with risk of obesity among participants in NHS and HPFS

Is Knowing Half the Battle? Behavioral Responses to Risk Information from the National Health Screening Program in Korea

Nature Genetics: doi: /ng.3561

8/10/2012. Education level and diabetes risk: The EPIC-InterAct study AIM. Background. Case-cohort design. Int J Epidemiol 2012 (in press)

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

UK Biobank: a large prospective cohort study into the causes of common complex diseases. Presentation to participants, 22 nd April 2015.

Depok-Indonesia STEPS Survey 2003

Su Yon Jung 1*, Eric M. Sobel 2, Jeanette C. Papp 2 and Zuo-Feng Zhang 3

Gene-Environment Interactions

Guidelines on cardiovascular risk assessment and management

CHAPTER 3 DIABETES MELLITUS, OBESITY, HYPERTENSION AND DYSLIPIDEMIA IN ADULT CENTRAL KERALA POPULATION

Supplementary Figures

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Agilent s Copy Number Variation (CNV) Portfolio

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Identification of regions with common copy-number variations using SNP array

Andrew Cohen, MD and Neil S. Skolnik, MD INTRODUCTION

Repeat ischaemic heart disease audit of primary care patients ( ): Comparisons by age, sex and ethnic group

Mitochondrial DNA (T/C) Polymorphism, Variants and Heteroplasmy among Filipinos with Type 2 Diabetes Mellitus

Review of guidelines for management of dyslipidemia in diabetic patients

AUTONOMIC FUNCTION IS A HIGH PRIORITY

Epidemiologic Measure of Association

Association-heterogeneity mapping identifies an Asian-specific association of the GTF2I locus with rheumatoid arthritis

Supplementary Appendix

JUPITER NEJM Poll. Panel Discussion: Literature that Should Have an Impact on our Practice: The JUPITER Study

studies would be large enough to have the power to correctly assess the joint effect of multiple risk factors.

Transcription:

Supplementary Methods Populations ascertainment and characterization Our genotyping strategy included 3 stages of SNP selection, with individuals from 3 populations (Europeans, Indian Asians and Mexicans). In stage one, we performed genome-wide association scans in 1005 European men and 1006 Indian Asian men from the London Life Sciences Population (LOLIPOP) study, UK. In stage two, we genotyped 1822 SNPs in 859 UK European women, 1181 UK Indian Asian women, 968 Mexican men and 1560 Mexican women. In the final stage we validated 32 SNPs in 5968 European male and female subjects. Collection of each cohort was approved by the relevant Institutional Ethics Committees, and all subjects gave written informed consent. European and Indian Asian subjects genotyped in stages one and two LOLIPOP is a cohort study of cardiovascular health in men and women registered with family practitioners in West London, recruited with a response rate of 62%. Characteristics of subjects are shown in Tables S1 and S2. Europeans were recruited if all 4 grandparents were born in the UK; Indian Asians if all 4 grandparents were born in the Indian Subcontinent. The assessment of participants was carried out by trained research nurses according to a standardized protocol. An interviewer-administered questionnaire was used to collect data on medical history, family history, current prescribed medication (verified from the practice computerized records), cardiovascular risk factors, and alcohol intake. Country of birth of participants, parents, and grandparents were recorded together with language and religion for assignment of ethnic subgroups. Physical assessment included blood pressure, and anthropometric measurements (height, weight, waist circumference). Blood pressure was taken as the mean of 3 measurements recorded using an Omron705, with the subject seated, over at least 10 min. Waist circumference was measured using 1

a non-stretchable tape measure, at the midpoint between iliac crest and costal margin. Blood was collected after an 8 hour fast for biochemical analysis, including glucose, total and HDL cholesterol and triglycerides. Aliquots of whole blood were collected into EDTA, and stored at 80 C. Subjects were assigned case/control status based on ATP III criteria, 1 with cases having three or more and controls have two or less of the following risk factors: abdominal obesity (waist circumference 101.6 cm for men, or 88.9 cm for women), high triglyceride levels (> 1.695 mmol/l), low HDL cholesterol levels (< 1.036 mmol/l for men, or < 1.295 mmol/l for women), high blood pressure (systolic 130 and diastolic 85 mmhg, or documented medication for hypertension), and high fasting glucose levels ( 6.105 mmol/l, or previously diagnosed diabetes). For males selected for stage one, we identified cases and controls matched 1:1 for age, smoking status, and alcohol intake. For females selected for stage two, we took all available samples. Characteristics of subjects genotyped in stages one and two are shown in Tables S1 and S2, respectively. Mexican subjects genotyped in stage two Mexican men and women were recruited by the Instituto Nacional de Ciencias Médicas y Nutrición in Mexico City, Mexico. Subjects were identified from outpatient lipid, internal medicine, diabetes, thyroid, irritable bowel, and osteoarthritis clinics, and from local factories. Characteristics of the genotyped subjects are shown in Table S2. Europeans genotyped in stage three Subjects were recruited as part of the TNT study 2, and consisted of males and females between 35 and 75 years of age, who had clinically evident CHD. CHD was defined here as having previous 2

myocardial infarction, previous or current angina with objective evidence of atherosclerotic CHD, or a history of coronary revascularization. Characteristics of genotyped subjects are shown in Table S3. SNP characteristics The SNPs used for the whole genome scans in stage one were chosen to provide good coverage of common genetic variation across the genome. Different criteria were used for selection of the SNP sets for the two scans, reflecting technical improvements as well as improvements in our understanding of the correlation structure of human genetic variation, but resulting in different but overlapping SNP sets. For the European scan, we selected a total of 266,722 common tagsnps with MAF 0.10 based on pairwise linkage disequilibrium (LD) determined in samples of European ancestry. 3 For the Indian Asian scan, we selected common haplotype defining SNPs from a multi-ethnic map, 4 supplemented with validated SNPs from dbsnp to obtain more uniform spacing. A total of 248,537 SNPs with an average spacing of 13.5 kb were chosen. There were 98,588 SNPs in common between the two platforms. We analyzed the genomic coverage achieved by the SNP sets used in the two genome-wide scans using LD data from the Phase II HapMap 5 for the CEU sample. We do not have genomewide LD information for an Indian Asian population, but recent results indicate that the HapMap CEU panel is most appropriate for tag SNP selection in Indian Asians, 6 and the selection of SNPs for the Indian Asian scan was not narrowly targeted by ancestry. For each HapMap SNP with MAF 10%, we identified the assayed SNP with maximum r 2, for the two whole genome SNP sets. Using HapMap data for 10 ENCODE regions of 500 kb, we determined that 58% of SNPs with MAF 0.1 had an assayed SNP with r 2 0.8 in the Asian Indian scan, and the best proxy had 3

an average r 2 of 0.76. In the European scan, 73% of HapMap SNPs had an assayed SNP with r 2 0.8, and the best proxy had an average r 2 of 0.84. Coverage based on the subsets of SNPs that were successfully genotyped was roughly 5% lower in each case. Genomic coverage estimates based on the ENCODE regions have been shown to be nearly equivalent to estimates based on the entire Phase II HapMap 7. Evaluating coverage of our SNP sets across the entire HapMap would introduce several sources of bias. Most of the Phase II HapMap genotypes were determined by Perlegen using the same technology used in this study, and SNP selection for the HapMap was informed by the same linkage disequilibrium data used to choose tag SNPs in our European genome scan 3. The ENCODE regions offer the advantage of nearly complete platform-neutral ascertainment of common variants. Genotyping Sample preparation Whole-genome amplification was performed on samples with less than 35 ug genomic DNA as described elsewhere 8. Multiplex PCR reactions were set up as follows (per reaction): 10 ng of genomic DNA was amplified using ~ 220-plex PCR primer pairs (0.1 um of each primer), 3.25 mm dntps, 12.5 X Titanium Taq (Clontech), 0.19M Tricine, 3.35 X MasterAmp PCR Enhancer with Betaine (Epicentre Biotechnologies), 0.235% DMSO, 0.3M KCl, 0.37M Trizma, 0.1 M (NH4)2SO4, and 0.02M MgCl2, in a volume of 6ul. Thermocycling was performed using a 9700 cycler (Perkin-Elmer) as follows: 5 minutes at 96 o C; 55 cycles of 96 o C for 2 seconds and 53 o C for 2 minutes per cycle, then 15 minutes at 50 o C. Excess unincorporated nucleotides were dephosphorylated using Shrimp Alkaline Phosphatase (SAP), and purified using a 3K 96-well filter plate (Pall Scientific) fitted onto a vacuum manifold with a pressure of > 25 cm Hg. 16ug of each purified pooled PCR product was labeled with 40 nmol each of biotin-16-ddutp and biotin- 16-dUTP (Perkin Elmer) using 1400 units of recombinant TdT (Roche). 7.4 ul of 10 mg/ml 4

herring sperm DNA was added to each DNA sample and the samples denatured at 99 o C for 20 minutes. Array Hybridization The labeled PCR products were hybridized to the high-density oligonucleotide arrays at 50 o C overnight in the following conditions: 70 ng/ul DNA, 3M TMACl, 10mM Tris ph 8.0, 0.01% Triton X-100, 0.05 nm control oligo, and 0.42 mg/ml herring sperm DNA. After a brief wash in 1X MES buffer, the arrays were then incubated with 5 ug/ml streptavidin (Sigma Aldrich) for 15 minutes at 25 o C, followed by 1.25 ug/ml biotinylated anti-streptavidin antibody (Vector Laboratories) for 10 minutes at 25 o C, and then 1 ug/ml streptavidin-cy-chrome conjugate (Molecular Probes) for 15 minutes at 25 o C. After a final wash in 0.2X SSPE for 30 minutes at 37 o C the arrays were scanned with a custom built confocal laser scanner to measure the Cy-chrome fluorescence of the hybridized labeled sample. The intensities of the perfect-match and mismatch alleles features were used to determine the genotypes of the SNPs. Data filtering After quality filters were applied to the stage one data, there were and 221,658 and 190,220 SNPs with a call rate of at least 90% that were polymorphic in the European and Indian Asian scans, respectively. Tests for association were performed for a smaller set of 216774 SNPs in the European scan and 180410 SNPs in the Indian Asian scan, with Hardy-Weinberg equilibrium P > 0.001. These sets had average call rates of 98.9% and 98.6%, respectively. The arrays used in the European scan performed better because these SNPs were chosen from a set that had already performed well on similarly designed arrays, and were specifically chosen for optimal coverage in a panel of European ancestry. For stage two, 922 SNPs were genotyped for triglycerides, and 900 for HDL cholesterol. Of these, 689 and 656 respectively were polymorphic and passed QC (call rate>90% and Hardy- 5

Weinberg P > 10 9 ). For each phenotype, we tested just the subset of SNPs that had been selected for genotyping based on prior data for that phenotype in the whole genome stages of the project. In stage three, the average call rate for the 32 SNPs was 98.2%. Statistical Analysis Genotypes were coded as allele counts (0, 1, 2) in linear regression models. This corresponds to fitting an additive model where each allele copy makes the same incremental contribution to the phenotype. Models included adjustment for age, alcohol (stages one and two), gender (stages two and three) and CHD status (stage one). Triglycerides and HDL cholesterol were transformed prior to analysis using either 1/sqrt of log transformation to remove skew. The primary test for association consisted of a comparison of the variance explained by the full model, versus variance explained by a model without the genotype term. In the stage one scans, we did not see strong evidence for population stratification in the test results. Using Genomic Control 8, variance inflation factors ( ) determined for the various phenotypes were <=1.07, so we did not make corrections for population structure. In stage two, we used principal components analysis (PCA) to characterize population structure 9. The first two principal components effectively capture the reported ancestry of the samples (Fig. S1). The third component identified a subset of SNPs whose genotypes were unusually sensitive to experimental variability in the genotyping process. The existence of this type of artifact and the ability of PCA to detect it has been previously noted 9,10. This component was correlated with variability in overall brightness of a microarray, which is sensitive to experimental variation in the fragmentation, hybridization, and staining processes. While the genotype calling algorithm attempts to correct for systematic scan-level variation, the correction is less successful for some SNPs. Additional components generally seemed to be associated with individual elements of 6

local linkage disequilibrium structure, rather than correlations across unlinked markers. Hence we included just the top three components as covariates in the regression models. To improve the accuracy of P values and false discovery rate estimates in stage two, we also employed genomic control to measure and eliminate limited amounts of residual variance inflation in the test statistics. For each phenotypic analysis, we computed variance inflation factors using the complementary set of SNPs genotypes in stage two that had not been selected specifically for association testing on that phenotype. We first transformed P values to χ 2 statistics, and then computed the inflation factor as the median test statistic divided by 0.455. This yielded =1.05 for triglycerides and =1.10 for HDL cholesterol. Residual variance inflation may be due to population structure not accounted for by the top principal components, or violations of assumptions of the parametric models (i.e. homoscedasticity, normality of residuals). Stage three involved analysis of 32 SNPs in a cohort of men and women of European ancestry. Results of linear regression analyses in stage three were combined with results of stage two, using Fisher s method. 11 This approach was used instead of a joint analysis, because we did not genotype a sufficient number of SNPs in stage 3 to model population structure in that set of samples. We did not include stage one data in the combined analysis as SNPs tested in stage two and three were not consistently present in the stage one scans. For many of our reported associations, we identified multiple SNPs in the same genomic interval. To determine the extent to which these associations were independent, as opposed to indirect associations due to linkage disequilibrium, we performed analyses of two-snp models in the stage two and stage three data. Given a pair of SNPs from Table 1 in the same genomic interval, we determined whether one SNP still accounted for a significant amount of variance, after conditioning on the other SNP. This analysis cannot prove independence, since two assayed markers may be in low LD with one another, but both be in LD with a hidden causal variant. But it 7

can identify sets of markers consistent with just one causal variant. Since we did not attempt to obtain comprehensive coverage of common variants in these intervals, we think this pairwise analysis is more appropriate than a haplotype based approach. In the triglyceride analysis, three regions were identified with multiple significant associations: the MLXIPL region with four SNPs, the LPL region with five SNPs, and the APO cluster region with four SNPs. For the MLXIPL region, conditioning on rs3812316 accounted for association at rs12056034, rs17145732, and rs799160 (stage two: P=0.83, P=0.27, P=0.35; stage three: P=0.13, P=0.14, P=0.03). For the LPL region, conditioning on rs328 accounted for association at rs325, rs17410914, and rs4406409 (P=0.57, P=0.07, P=0.15). There was weak evidence for residual association at rs326 (P=0.004) but rs328 still accounted for most of the variance attributable to rs326. For the APO region, conditioning on rs1558861 accounted for associations at rs2075292, rs7124741, and rs17120139 (stage two: P=0.10, P=0.17, P=0.15; stage three: P=0.17, P=0.10, P=0.08). We examined the association of rs3812316 with other metabolic phenotypes. Although rs3812316 was associated in stage two with HDL (P=0.0002), hypertension (P=0.02) and metabolic syndrome (P=0.01), these relationships were confounded by the correlations of these phenotypes with triglycerides. After adjustment for triglycerides, the associations were no longer significant. In stage three, the association of rs3812316 with these phenotypes did not replicate. The relationship of rs3812316 with triglycerides was not influenced by BMI or gender. In the HDL analysis, two regions were identified with multiple significant associations: the LPL region with three SNPs, and the CETP region with six SNPs. For the LPL region, in stage two, tests for conditional association indicated that the three SNPs were not independent, but no SNP stood out as most informative (for all two-snp models, P>0.05 for a second SNP effect). In stage three, rs326 was most informative, and conditioning on this SNP abolished associations at 8

rs325 and rs328 (P=0.85, P=0.93). For the CETP region, in stage two, conditioning on rs7205804 accounted for associations at rs2217332, rs711752, and rs5882 (P=0.27, P=0.05, P=0.10). The signal at rs5880 was partially independent of rs7205804 (P=3.2 10 7 ), and rs1800777 was accounted for by rs5880 (P=0.89). In stage three, we saw a similar pattern, though in this case, rs711752 was best at accounting for rs2217332, rs7205804, and rs5882 (P=0.09, P=0.87, P=0.22). Again, the signal at rs5880 was partially independent of rs711752 (P=3.7 10 8 ) and accounted for rs1800777 (P=0.19). Lipid lowering with statins may have influenced the genotype-phenotype relationships. Although we did not adjust for statin use in stages 1 and 2 (where usage averaged 37% and 10% respectively), measurements for stage 3 were taken after a wash-out period with no statin usage, and therefore are unaffected by treatment confounding. To address the question of SNP interactions, for triglyceride and HDL analyses, we examined all two-snp models for the SNPs identified in Table 1, in the stage two data. We included additive main effects for both SNPs plus a multiplicative interaction term. We assessed significance of the interaction term by analysis of variance after accounting for main effects. For both phenotypes, we found no interactions that were significant after Bonferroni correction. We repeated the analyses with genotypes coded as factors rather than allele counts, with the same result. 9

References 1. Grundy, S. M. et al. Diagnosis and management of the metabolic syndrome: an American Heart Association/National Heart, Lung, and Blood Institute scientific statement. Circulation 112, 2735-2752 (2005). 2. LaRosa, J. C. et al. Intensive lipid lowering with atorvastatin in patients with stable coronary disease. N. Engl. J. Med. 352, 1425-35 (2005). 3. Hinds, D. A. et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072-1079 (2005). 4. Patil, N. et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719-23 (2001). 5. International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299-1320 (2005). 6. Conrad, D. F. et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat. Genet. 38, 1251-1260 (2006). 7. Pe er, I. et al. Evaluating and improving power in whole-genome association studies using fixed marker sets. Nat. Genet. 38, 663-667 (2006). 8. Bacanu, S.-A., Devlin, B. & Roeder, K. The power of genomic control. Am J. Hum. Genet. 66, 1933-1944 (2000). 9. Price, A. I. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904-909 (2006). 10. Clayton, D. G. et al. Population structure, differential bias and genomic control in a largescale, case-control association study. Nat. Genet. 37, 1243-1246 (2005). 11. Fisher, R.A. Statistical methods for research workers, 13th ed. Oliver & Loyd, London (1925). 12. Saxena,R. et al. Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels. Science(2007). 10

Supplementary Figure S1. Principal components analysis of stage two data. Results are shown for samples projected onto the top two principal components, colored by reported ancestry. The scaling of the two axes is essentially arbitrary. The shape of the Mexican cluster is consistent with PC1 quantifying a sample s European ancestry in this admixed population. 11

Supplementary Table S1. Demographic and clinical features of samples used in stage one Indian Asian scan European scan Cases Controls Cases Controls Subjects 501 505 499 506 Mean age, years 52.6 52.4 55.7 55.4 Waist circumference, cm 105.2 93.2 109.7 92.0 Triglyceride level, mmol/l 2.66 1.32 2.61 1.09 HDL cholesterol, mmol/l 1.09 1.31 1.09 1.42 Systolic blood pressure, mmhg 142.7 131.7 142.8 131.3 Diastolic blood pressure, mmhg 87.8 80.6 86.1 78.0 Glucose, mmol/l 7.38 5.28 6.70 5.21 CHD (%) 228 (45.5) 135 (26.7) 256 (51.3) 191 (37.8) Alcohol (%) 0 units per week 248 (49.5) 267 (52.9) 143 (28.7) 79 (15.6) 1 10 units per week 113 (22.6) 120 (23.8) 133 (26.7) 143 (28.3) > 10 units per week 140 (27.9) 118 (23.4) 223 (44.7) 284 (56.1) 12

Supplementary Table S2. Demographic and clinical features of samples used in stage two Indian Asian Females European Females Mexican Males Mexican Females Cases Controls Cases Controls Cases Controls Cases Controls Subjects 412 769 156 703 422 542 796 762 Mean age, years 52.0 50.4 58.7 52.1 51.8 46.8 56.4 45.2 Waist circumference, cm 101.5 91.8 103.3 86.6 101.3 87.9 96.2 79.1 Triglyceride level, mmol/l 1.91 1.24 2.30 1.10 3.49 2.27 2.76 1.52 HDL cholesterol, mmol/l 1.18 1.44 1.27 1.67 0.89 1.02 1.09 1.30 Systolic blood pressure, mmhg 131.9 121.5 139.3 123.0 132.3 122.1 135.8 118.7 Diastolic blood pressure, mmhg 81.9 76.2 82.2 75.6 85.3 79.6 86.2 76.7 Glucose, mmol/l 6.64 5.22 6.62 5.04 7.59 5.26 7.09 4.93 CHD (%) 32 (7.8) 12 (1.6) 14 (9.0) 11 (1.6) 38 (9.0) 17 (3.1) 51 (6.4) 12 (1.6) Alcohol (%) 0 units per week 398 (96.6) 719 (93.5) 103 (66.0) 295 (42.0) 386 (91.5) 499 (92.1) 792 (99.5) 748 (98.2) 1 10 units per week 12 (2.9) 44 (5.7) 41 (26.2) 243 (34.6) 14 (3.3) 22 (4.1) 1 (0.1) 12 (1.6) > 10 units per week 2 (0.5) 6 (0.8) 12 (7.7) 165 (23.5) 22 (5.2) 21 (3.9) 3 (0.4) 2 (0.3) 13

Supplementary Table S3. Demographic and clinical features of samples used in stage three Males Females Cases Controls Cases Controls Subjects 1765 2882 553 513 Mean age, years 59.9 60.9 63.4 63.5 Body Mass Index, kg/m 2 31.1 26.7 31.9 26.5 Triglyceride level, mmol/l 2.80 1.91 2.94 2.04 HDL cholesterol, mmol/l 1.08 1.30 1.28 1.58 Systolic blood pressure, mmhg 133.5 128.4 136.3 130.7 Diastolic blood pressure, mmhg 79.8 77.2 77.9 76.1 Glucose, mmol/l 6.64 5.50 6.89 5.36 CHD (%) 1765 (100.0) 2882 (100.0) 553 (100.0) 513 (100.0) 14

Supplementary Table S4. Stage One results for SNPs tested in Stage Three Indian Asian Scan European Scan SNP ID Freq Effect SE P Freq Effect SE P Triglycerides rs12056034 0.893 +0.1636 0.0390 3.0 10 6 rs17145732 0.811 +0.0880 0.0311 0.0026 0.720 +0.0305 0.0301 0.28 rs3812316 rs799160 0.683 +0.0890 0.0256 0.0014 0.473 +0.0548 0.0266 0.055 rs325 rs326 0.760 +0.0808 0.0278 0.0020 0.711 +0.1368 0.0289 7.7 10 5 rs328 0.898 +0.1800 0.0436 6.3 10 5 rs17410914 0.917 +0.1133 0.0428 0.016 0.900 +0.1899 0.0437 2.3 10 5 rs4406409 rs1558861 0.194 +0.1905 0.0297 4.6 10 10 rs2075292 0.329 +0.1109 0.0250 2.2 10 5 rs7124741 0.333 +0.1092 0.0242 8.8 10 6 rs17120139 0.347 +0.1499 0.0275 6.2 10 8 HDL Cholesterol rs326 0.760 0.0080 0.0055 0.14 0.711 0.0253 0.0059 1.8 10 5 rs9282541 rs11858164 0.506 0.0171 0.0047 3.3 10 4 rs2217332 rs711752 0.555 0.0338 0.0053 3.3 10 10 rs7205804 0.512 0.0114 0.0050 0.022 rs5880 rs5882 0.673 0.0143 0.0057 0.013 rs1800777 Freq, risk allele frequency; Effect, effect size for the first listed allele from Table 1, in log units; SE, standard error of the effect size; P, a two-sided P value from an ANOVA for significance of the genotype effect; P values significant at the P<0.05 level after Bonferroni correction for the total number of SNPs tested are highlighted. 15

Supplementary Table S5. Residual phenotypic variance (r2) accounted for by replicated associations Locus SNP Stage two Stage three HDL Cholesterol LPL rs326 0.0019 0.0046 LIPC rs11858164 0.0024 0.0038 CETP rs7205804 0.0350 0.0117 Triglyceride Level MLXIPL rs3812316 0.0045 0.0043 LPL rs326 0.0056 0.0045 APO A1/C3/A4/A5 rs1558861 0.0171 0.0071 SNP rs780094 in GCKR (associated with triglycerides in a previous GWA study 12 ) was modestly associated with triglycerides in both current genome scans (P=0.02 and P=0.006), with the same direction of effect. However, these P values did not satisfy the criteria for inclusion in stage two. 16