Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

Similar documents
Multiple regions within 8q24 independently affect risk for prostate cancer

The Multiethnic Cohort Study: Studying Ethnic Diversity and Cancer. Laurence N. Kolonel, MD, PhD

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

UNIVERSITY OF CALIFORNIA, LOS ANGELES

CS2220 Introduction to Computational Biology

Supplementary Figures

Type 2 Diabetes in Native Hawaiians according to Admixture: The Multiethnic Cohort

Obesity and Breast Cancer in a Multiethnic Population. Gertraud Maskarinec, MD, PhD University of Hawaii Cancer Center, Honolulu, HI

Human population sub-structure and genetic association studies

Smoking categories. Men Former smokers. Current smokers Cigarettes smoked/d ( ) 0.9 ( )

Association between ERCC1 and ERCC2 polymorphisms and breast cancer risk in a Chinese population

Genetic variants on 17q21 are associated with asthma in a Han Chinese population

Tutorial on Genome-Wide Association Studies

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

NIH Public Access Author Manuscript Br J Nutr. Author manuscript; available in PMC 2015 September 28.

Effects of age-at-diagnosis and duration of diabetes on GADA and IA-2A positivity

Utility of Incorporating Genetic Variants for the Early Detection of Prostate Cancer

Researching Genetic Influences in Different Racial/Ethnic Populations and Cancer

CONTENT SUPPLEMENTARY FIGURE E. INSTRUMENTAL VARIABLE ANALYSIS USING DESEASONALISED PLASMA 25-HYDROXYVITAMIN D. 7

University of Groningen. Metabolic risk in people with psychotic disorders Bruins, Jojanneke

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer

Quality Control Analysis of Add Health GWAS Data

Supplementary Table 1. Association of rs with risk of obesity among participants in NHS and HPFS

indicated in shaded lowercase letters (hg19, Chr2: 217,955, ,957,266).

Nature Genetics: doi: /ng Supplementary Figure 1

Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels.

Cumulative Association of Five Genetic Variants with Prostate Cancer

Lack of association between ERCC5 gene polymorphisms and gastric cancer risk in a Chinese population

Variation in PNPLA3 is associated with outcomes. in alcoholic liver disease

Su Yon Jung 1*, Eric M. Sobel 2, Jeanette C. Papp 2 and Zuo-Feng Zhang 3

Meat consumption and risk of type 2 diabetes: the Multiethnic Cohort

Global variation in copy number in the human genome

A common genetic variant of 5p15.33 is associated with risk for prostate cancer in the Chinese population

GENOME-WIDE ASSOCIATION STUDIES

Association-heterogeneity mapping identifies an Asian-specific association of the GTF2I locus with rheumatoid arthritis

Association between ERCC1 and ERCC2 gene polymorphisms and susceptibility to pancreatic cancer

Association between the -77T>C polymorphism in the DNA repair gene XRCC1 and lung cancer risk

The Breast Cancer Family Registry: Description of Resource and some Applications

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland

Supplementary Figure 1 Forest plots of genetic variants in GDM with all included studies. (A) IGF2BP2

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017

Association between the 8q24 rs T/G polymorphism and prostate cancer risk: a meta-analysis

Consideration of Anthropometric Measures in Cancer. S. Lani Park April 24, 2009

Additional Disclosure

HNF1B and Endometrial Cancer Risk: Results from the PAGE study

LTA Analysis of HapMap Genotype Data

Mendelian Randomization

New Enhancements: GWAS Workflows with SVS

Breast Cancer Subtypes Defined by HR/HER2 among Black Cases in the US by Birthplace

GROWTH HORMONE GENES AND PROSTATE CANCER RISK. Amy S. Dressen. B.A., University of Oklahoma, Submitted to the Graduate Faculty of

Table Cohort studies of consumption of alcoholic beverages and cancer of the colorectum

Risk Factors in African-American Women. Michele L. Cote, PhD Associate Professor Wayne State t University

Supplementary Figure 1 Dosage correlation between imputed and genotyped alleles Imputed dosages (0 to 2) of 2-digit alleles (red) and 4-digit alleles

White Paper Estimating Genotype-Specific Incidence for One or Several Loci

Numerous hypothesis tests were performed in this study. To reduce the false positive due to

Pirna Sequence Variants Associated With Prostate Cancer In African Americans And Caucasians

Gene-Environment Interactions

A total of 2,822 Mexican dyslipidemic cases and controls were recruited at INCMNSZ in

SUPPLEMENTARY DATA. 1. Characteristics of individual studies

IL10 rs polymorphism is associated with liver cirrhosis and chronic hepatitis B

Dan Koller, Ph.D. Medical and Molecular Genetics

Supplementary information. Supplementary figure 1. Flow chart of study design

Utilization of OncoType DX Test for

Example HLA-B and abacavir. Roujeau 2014

TITLE: Vitamin D, Vitamin D Receptor Polymorphisms and Breast Cancer Aggressiveness in African American and European American Women

Correlations between the COMT gene rs4680 polymorphism and susceptibility to ovarian cancer

Transformation of Breast Cancer in Taiwan

Conditions. Name : dummy Age/sex : xx Y /x. Lab No : xxxxxxxxx. Rep Centre : xxxxxxxxxxx Ref by : Dr. xxxxxxxxxx

Title:Validation study of candidate single nucleotide polymorphisms associated with left ventricular hypertrophy in the Korean population

BIOSTATISTICAL METHODS

Peroxisome proliferator-activated receptor-alpha (PPARA) genetic polymorphisms and breast cancer risk: a Long Island ancillary study

Supplementary Methods

Numerous hypothesis tests were performed in this study. To reduce the false positive due to

TITLE: A Genome-wide Breast Cancer Scan in African Americans. CONTRACTING ORGANIZATION: University of Southern California, Los Angeles, CA 90033

Investigating the role of polymorphisms in mir-146a, -149, and -196a2 in the development of gastric cancer

FTO Polymorphisms Are Associated with Obesity But Not with Diabetes in East Asian Populations: A Meta analysis

Genome-wide association studies (case/control and family-based) Heather J. Cordell, Institute of Genetic Medicine Newcastle University, UK

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin,

Supplementary Table 4. Study characteristics and association between OC use and endometrial cancer incidence

Leveraging Interaction between Genetic Variants and Mammographic Findings for Personalized Breast Cancer Diagnosis

Using ancestry estimates as tools to better understand group or individual differences in disease risk or disease outcomes

Red meat and poultry, cooking practices, genetic susceptibility and risk of prostate cancer: results from a multiethnic case control study

An Admixture Scan in 1,484 African American Women with Breast Cancer

Variants in DNA Repair Genes and Glioblastoma. Roberta McKean-Cowdin, PhD

Confounding. Confounding and effect modification. Example (after Rothman, 1998) Beer and Rectal Ca. Confounding (after Rothman, 1998)

Coffee is one of the most widely consumed beverages in the world.

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

NIH Public Access Author Manuscript Cancer Epidemiol Biomarkers Prev. Author manuscript; available in PMC 2011 September 1.

breast cancer; relative risk; risk factor; standard deviation; strength of association

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Nature Neuroscience: doi: /nn Supplementary Figure 1. Missense damaging predictions as a function of allele frequency

Introduction to the Genetics of Complex Disease

Supplementary Methods. 1. Cancer Genetic Markers of Susceptibility (CGEMS) Prostate Cancer Genome-Wide Association Scan

Big Data Training for Translational Omics Research. Session 1, Day 3, Liu. Case Study #2. PLOS Genetics DOI: /journal.pgen.

Supplementary Online Content

Comparison of Two Instruments for Quantifying Intake of Vitamin and Mineral Supplements: A Brief Questionnaire versus Three 24-Hour Recalls

Final Report 22 January 2014

Nature Genetics: doi: /ng Supplementary Figure 1. Study design.

MISSING IN ACTION : Ethnic Groups in Cancer Screening

Controlling Bias & Confounding

Transcription:

Supplementary Figure. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations. a Eigenvector 2.5..5.5. African Americans European Americans e Eigenvector from colorectal cancer study.2...2.3.4 y =.62x.375.5.6.5.4.3.2...2.3.4 Eigenvector proportional to European ancestry.5..2.4.6.8 European ancestry estimate from prostate study b.5 Native Hawaiians European Americans f.2. y =.4398x.243 Eigenvector 2..5.5..5..8.6.4.2.2.4.6 Eigenvector proportional to European ancestry Eigenvector from colorectal cancer study..2.3.4.5.6.7.8..5.5. European anc. estimate from prostate study eigenvector c Eigenvector 2..5.5. Latinos European Americans g Eigenvector from colorectal cancer study.2.2.4.6 y =.428x.57.5..8.6.4.2.2.4.6 Eigenvector proportional to European ancestry.8.9.6.3.3.6.9 European anc. estimate from prostate study eigenvector Supplementary Figure : (ac) To estimate proportion of European ancestry from each of the 3 admixed populations, we used a panel of ancestry informative markers chosen specifically for each population (43 for African Americans, 39 for Native Hawaiians and 4 for Latinos, as described in Supplementary Methods). Principal components analysis was performed combining the samples from the admixed population, with European Americans, to assist in the identification of an eigenvector proportional to ancestry. Eigenvector (x axis) is proportional to European ancestry, whereas Eigenvector 2 is shown merely to help with visualization and is not significantly correlated with ancestry. (df) We also plot the estimate of European ancestry for our published prostate cancer study (xaxis), against the eigenvectorbased estimate of European ancestry from the colorectal cancer candidate gene study (yaxis), for the samples from each of the three admixed populations that overlap between the two data sets. We used linear regression to relate the estimates in the two studies (shown in figure), allowing us to obtain a European ancestry estimate that is comparable across studies.

Supplementary Figure 2. Tag SNP coverage at 28.4728.54 Mb in the Japanese, West African and European American HapMap populations. Maximum r 2 value to SNP.8.6.4.2 Japanese: 93% of 29 common SNPs tagged at r 2 >.8 28.47 28.48 28.49 28.5 28.5 28.52 28.53 28.54 Maximum r 2 value to SNP.8.6.4.2 West African: 92% of 72 common SNPs tagged at r 2 >.8.8.6.4.2 28.47 28.48 28.49 28.5 28.5 28.52 28.53 28.54 28.47 28.48 28.49 28.5 28.5 28.52 28.53 28.54 Maximum r 2 value to SNP European American: 95% of 63 common SNPs tagged at r 2 >.8 Physical position in megabases on chromosome 8 Supplementary Figure 2: Maximum correlations (r 2 ) between the 82 SNPs genotyped from 28.4728.54 Mb and all SNPs of >5% minor allele frequency in the Japanese, West Africans and European Americans in the HapMap database. Blue diamonds represents a SNP from HapMap (the red diamond is rs6983267). These tags capture 93%, 92% and 95% of the common SNPs in each population, respectively, with r 2 >.8.

Supplementary Methods Study Populations The Multiethnic Cohort Study: The initial testing of the six risk alleles was conducted in the Multiethnic Cohort Study (MEC). The MEC consists of over 25, men and women in Hawaii and Los Angeles (with additional AfricanAmericans from elsewhere in California). The cohort is comprised predominantly of African Americans, Japanese Americans, Native Hawaiians, Latinos and European Americans who entered the study between 993 and 996 by completing a 26page selfadministered questionnaire that requested detailed information about dietary habits, demographic factors, personal behaviors, history of prior medical conditions, family history of common cancers, and for women, reproductive history and exogenous hormone use. The participants were between the ages 45 and 75 at enrollment. Incident cancers in the MEC are identified by cohort linkage to populationbased cancer Surveillance, Epidemiology and End Results (SEER) registries covering Hawaii and Los Angeles County, and to the California State cancer registry covering all of California. From the registries, information about stage of disease and site of tumor (colon versus rectum) is available. Beginning in 994, blood samples were collected from incident colorectal cancer cases and a random sample of MEC participants to serve as a control pool for genetic analyses in the cohort. Eligible cases in the colorectal cancer casecontrol study consisted of men and women with incident invasive colorectal cancer diagnosed after enrollment in the MEC through December, 24. Controls were participants without colorectal, breast or prostate cancer prior to entry into the cohort and without a diagnosis up to December 24 (the pool of controls was shared across our studies). The colorectal cancer casecontrol study in the MEC consisted of,4 invasive colorectal cancer cases and 4,67 controls (2,63 males were also included in our prostate cancer study 2 ). We excluded 34 European Americans and 6 Japanese Americans due to discrepancies between selfreported and inferred ancestry (see below), leaving,24 colorectal cancer cases and 4,573 controls for analysis. Staging information was available for 97.7% (n=,98) of cases and tumor location information was available on % of cases. This study was approved by the Institutional Review Boards at the University of Southern California and at the University of Hawaii. The Hawaii CaseControl Study: SNPs rs6983267 and rs954 were also examined in a populationbased casecontrol study of colorectal cancer that includes 327 invasive colorectal cancer cases and 525 controls. All the samples were Japanese Americans and European Americans; Native Hawaiians were part of the study, but were excluded from our analyses because they were not genotyped for ancestry informative markers, which were used as covariates in our analysis. This Hawaii CaseControl Study has been described in detail previously 3. Cases were identified through the Hawaii SEER registry and consisted of Japanese American, European American and Native Hawaiian residents of Oahu, Hawaii, who were newly diagnosed with colon or rectal cancer between January 994 and August 998. Controls were selected from participants in an ongoing populationbased health survey conducted by the Hawaii State Department of Health and from Health Care Financing Administration participants. Staging information was available on 98.8% (n=323) of cases and tumor location information was available on % of cases. A personal interview and a blood sample were obtained from each subject. This study was approved by the Institutional Review Board at the University of Hawaii. The Los Angeles County CaseControl Study: SNPs rs6983267 and rs954 were also examined in a populationbased casecontrol study of colorectal cancer that includes 356 invasive colorectal cancer cases and 43 controls who were European Americans. NonEuropean American case and control subjects were part of this study, but were not included in this analysis because they were not genotyped

for ancestry informative markers, which were used as covariates in our analysis. Cases were identified from the Los Angeles County Cancer Surveillance Program. Eligible cases included Englishspeaking women with a histologically confirmed incident colorectal cancer, diagnosed at ages 55 to 74 years, on or after January 998 through December 22 and were residents of Los Angeles County at the time of diagnosis. Controls were selected from the neighborhoods where cancer cases resided at the time of diagnosis using a wellestablished, standard algorithm to identify neighborhood controls used in numerous other casecontrol studies. A personal interview and a blood sample were obtained from each subject. Staging information was available on 99.7% (n=355) of cases and tumor location information was available on % of cases. This study was approved by the Institutional Review Board at the University of Southern California. The main results from this casecontrol study are described in a manuscript that is in preparation for submission 4. SNP Genotyping The variants were genotyped in the MEC samples using Sequenom and TaqMan platforms at the Broad Institute and at the University of Southern California (USC), respectively. Genotyping of rs6983267 and rs954 in the Hawaii casecontrol study was conducted at the University of Hawaii. Genotyping of these two variants in the Los Angeles CaseControl Study was conducted at USC. Blinded duplicate samples (~25%) were included in all 96well D plates. The discordance rate among duplicates was <.5% in the MEC and % in the casecontrol studies. FineMapping of Region of Strong Linkage Disequilibrium Around rs6983267 To identify variants that could potentially capture additional risk for colorectal cancer, above and beyond rs6983267, we focused on the region from 28.4728.54 Mb in strong linkage disequilibrium with this SNP. For this purpose, we mined genotyping data from the HapMap West African, European American and East Asian populations for 86 SNPs across the region: 9 of these SNPs have data in the HapMap database, while 77 SNPs were identified by new genotyping and SNP discovery in HapMap samples in our prostate cancer study (see Supp. Table 4 of ref. 2 for the nonhapmap SNPs). We selected 96 SNPs that captured at r 2 >.8 all variants of >5% minor allele frequency in any of the populations. We used the Sequenom MassArray iplex Gold technology 5 at the Broad Institute to attempt to genotype 9 of these SNPS (the ones that we could successfully design with the technology) in,7 colorectal cancer cases and,844 controls from the MEC. After removing SNPs that failed quality control filters described previously (ref. 2), we were left with 82 SNPs with genotypes useable for our analysis. In Supplementary Figure 2, we present results on how SNPs of >5% minor allele frequency in HapMap are captured by this panel, in each of the West African, European American, and Japanese populations. To prepare the data set for analysis, we removed 4 subjects (9 cases and 2 controls) with genotype call rates <6%. To assess genotyping quality, we used blinded duplicate samples (n=62) and HapMap trios (n=63). The concordance rate for these QC samples was >99.9%; there were only 3 discrepancies and no SNP had more that 2 discrepant genotype calls (<% error). All SNPs conformed to HardyWeinberg Equilibrium in control samples for at least four of the five populations (P>.). The final data set thus included genotype data for 82 SNPs (including rs6983267 and rs7448), genotyped in,88 cases and,823 controls (cases/controls: African Americans, 25/4; Japanese Americans, 374/4; Native Hawaiians, 6/239; Latinos, 243/426; European Americans, 26/337). 2

Statistical Analysis of the Six Variants at 8q24 Odds ratios (OR) and 95% confidence intervals were calculated using unconditional logistic regression. For each SNP, ethnicspecific and pooled odds ratios were estimated adjusting for gender, and, population and gender, respectively. In the African Americans, Native Hawaiians and Latinos we also controlled for the potential confounding effect of European genomewide ancestry (estimated by principal components as described below). For analyses of rs6983267 and rs954 we also adjusted for study. Heterogeneity of effects by population or gender was examined by the inclusion of interaction terms in multivariate models. We examined the evidence for multiplicative allelic effects by comparing the fit between models that included genotypespecific covariates (separate indicator variables for homozygotes and heterozygotes) and genotype as a linear variable (, or 2 copies of the variant allele). We used a! 2 test to evaluate significance. Interactions with age at diagnosis (!67 vs. >67 years, centered on the median age), family history of colorectal cancer in firstdegree relatives (yes or no), body mass index (<25 vs. "25 kg/m 2 ), smoking history (ever vs. never), aspirin use (ever vs. never) alcohol consumption (< vs. " drink/day) and use of estrogen therapy among women (ever vs. never) were examined by a likelihood ratio test. Heterogeneity of effects by stage (localized vs. regional/distant) and site (colon vs. rectum) were examined by logistic regression in caseonly analyses. To calculate population attributable risk (PAR) we let k j represent the number of copies of the risk allele at rs6983267 (j =,,2). We also let P i denote the proportion of controls in a given population with a given genotype, i, and let R i = exp[# k ] denote the relative risk (oddsratio) for each genotype. The PAR for each population is then PAR = ($P i R i ) / $P i R i. To formally test whether the 6 SNPs differed from each other in terms of their ability to predict whether a case was of prostate versus colorectal cancer (which if rejected, would indicate a different mechanism associated with some SNPs), we carried out a caseonly analysis (limited to colorectal and prostate cancer cases with complete data for all six SNPs:,799 prostate cases and,64 colorectal cases). In this analysis, we compared the likelihood of the data for a model in which a common odds ratio was fit to the allele dosage for all 6 SNPs (a maximum of 2 risk alleles), to the likelihood of the data in a model in which 6 separate OR parameters were estimated for each SNP separately. Twice the difference in the log likelihoods was compared to a chisquare distribution with 5 degrees of freedom to assess statistical significance. In this analysis adjustment was made for age at diagnosis, reported ethnicity, and estimated European ancestry in the admixed African American, Native Hawaiian and Latino populations. The statistical analysis of the finemapping data was performed using logistic regression controlling for population, gender and European genomewide ancestry proportion for the three admixed populations (African Americans, Native Hawaiians and Latinos). We performed both single SNP analyses and stepwise regression analyses. For the stepwise procedure, we started with the SNP of interest (rs6983267 or rs88556), and then tested all other SNPs (n=8) in the model one at a time. This was also performed separately for each ethnicity. Estimation of European Ancestry in African Americans, Native Hawaiians and Latinos To estimate the percentage of European ancestry in each of the individuals from 3 admixed populations from the MEC (African Americans, Native Hawaiians, and Latinos), we combined two different data sets. The first was collected for our study of prostate cancer susceptibility 2 (,547 African Americans, 223 Native Hawaiians and,27 Latinos), and consisted of ancestry informative markers genotyped in each of the three populations. The second set was collected for a study of colorectal cancer susceptibility focusing 3

on,339 SNPs in candidate genes involved in D repair (85 African Americans, 45 Native Hawaiians, 842 Latinos, 92 European Americans, and,224 Japanese Americans; in preparation). To combine the information from these two data sets to obtain a uniform estimate of ancestry, we used the fact that there were some overlapping samples between the two different studies (77 African American controls, 63 Native Hawaiian controls and 289 Latino controls). This allowed us to obtain equations relating the estimates of ancestry from the two data sets, leading to a single estimate proportional to European ancestry for 2,22 African Americans, 575 Native Hawaiians, and,923 Latinos. The equations we used are shown in Supplementary Figure. AncestryInformative Panels of Markers Extracted from the D Repair Gene Data Set From the,339 SNPs that had been genotyped for the D repair genes (all on chromosome 22), we were able to identify ancestryinformative sets of 43 SNPs in African Americans, 39 SNPs in Native Hawaiians, and 4 SNPs in Latinos. To obtain these panels, we first estimated the frequency of the variant allele of each SNP in all 5 MEC populations (African Americans, Japanese Americans, Native Hawaiians, Latinos and European Americans). We then selected markers that were widely spaced in the genome (at least.5 centimorgans apart), and chosen to be maximally informative for estimating ancestry in each population of interest. For African Americans, we picked markers to be maximally differentiated in frequency between African Americans and European Americans. For Native Hawaiians, we picked markers to be maximally differentiated in frequency between Native Hawaiians and European Americans. For Latinos, we picked markers to be maximally differentiated in frequency between Latinos and European Americans. Validation that the Markers are Ancestry Informative In Supplementary Figure, we show that for each of the populations, the first principal component from application of Principal Components Analysis (PCA) 6 is highly informative for estimating European ancestry proportion in the admixed populations. To obtain these plots, we carried out PCA separately on the European Americans and each of the admixed populations, and plotted the first principal component against the second principal component. The first principal component clearly separates the selfidentified European Americans from the admixed populations, and hence we used the value of this principal component for each individual as a number that is proportional to European ancestry. As a second validation of the usefulness of these estimates, Supplementary Figure shows a plot of the ancestry estimate obtained from this study, with that used in our previous study of prostate cancer based on a different set of ancestry informative markers. The estimates are highly correlated for samples that overlapped between the two studies, as expected if they capture the true proportion of ancestry in the admixed populations. Ancestry Estimates made Comparable across Colorectal and Prostate Cancer Data Sets We carried out linear regressions to relate the European ancestry estimate from the prostate cancer study, to the European ancestry estimate from the colorectal cancer study, for the samples that had been genotyped in both panels of ancestry informative markers (77 African Americans, 63 Native Hawaiians, and 289 Latinos). The result of the regression analysis is shown by the trendline and equation in each of the three right panels in Supplementary Figure (one equation for each population). 4

We used the equation to convert the estimates of European ancestry for each population from the prostate cancer data set, to estimates proportional to those from the colorectal cancer data set. We thus obtained a uniform number proportional to European ancestry for 2,22 African Americans, 575 Native Hawaiians, and,923 Latinos. We used these numbers as covariates (proportional to genomewide ancestry, allowing for stratification correction), in casecontrol analyses to detect SNP associations in each of the populations. Identification of Outlier European American and Japanese American Samples The first principal component plotted in the panels (ac) of Supplementary Figure not only allows us to estimate the proportion of European ancestry in admixed populations, but also to identify European Americans that appear to have unusual ancestry compared with others of the same selfdeclared ancestry. We compiled a list of 29 European American samples (out of the total of 92) that appeared to be outliers compared with others that were labeled with the same ancestry. We identified these as individuals with values <.7 for the African American first principal component, <.2 for the Native American first principal component, and <.25 for the Latino first principal component. We also identified additional outliers in the Japanese American samples, by repeating the analysis described above with a new panel of 39 markers chosen to be informative comparing Japanese American and European American ancestry (analysis not shown). This identified an additional Japanese Americans (out of a total of,224 Japanese Americans genotyped in the colorectal cancer D repair gene data set) that appeared to be genetic outliers with respect to others of the same selfdeclared ancestry. Finally, we combined the outlier analysis above, with the one from our previous study of prostate cancer, to obtain a combined list of ancestry outliers: 34 European Americans and 6 Japanese Americans. We removed these subjects from all subsequent analysis. References Kolonel, L.N. et al. A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol. 5, 346357 (2). 2 Haiman, C.A. et al. Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet. 39, 638644 (27). 3 Le Marchand, L. et al. Combined effects of welldone red meat, smoking, and rapid Nacetyltransferase 2 and CYPA2 phenotypes in increasing colorectal cancer risk. Cancer Epidemiol Biomarkers Prev., 259266 (2). 4 Wu, A. et al. in preparation for publication. 5 Tang, K. et al. Chipbased genotyping by mass spectrometry. Proc. Natl. Acad. Sci. USA 96, 6 2 (999). 6 Price, A.L. et al. Principal components analysis corrects for stratification in genomewide association studies. Nat Genet. 38, 9499 (26). 5

Supplementary Table : Association of rs6983267 and rs954 with colorectal cancer in the Hawaii and Los Angeles populationbased casecontrol studies Hawaii CaseControl Study Los Angeles CaseControl Study Marker / Position (Mb) Japanese Americans (2 cases / 367 controls) European Americans (6 cases / 58 controls) P Het a Pooled OR (95% CI) b European Americans (356 cases / 43 controls) rs6983267 28,482,487.32 (.3.7) 32%.4 (.8.6) 48%.5.26 (.3.54) P=.3.7 (.88.3) P=.48 rs954 28,6,39.7 (.78.48) 6%.3 (.57.85) 9%.9.6 (.8.4) P=.68.49 (.62.).2 Each square gives odds ratios (and 95% confidence intervals) for allele dosage effects along with the risk allele frequency in controls. a Pvalue testing for heterogeneity of allelic effects across populations in the Hawaii study; b OR adjusted for ethnic population and gender. SNPs rs6983267 and rs954 were genotyped successfully for!98% of cases and controls in the Hawaii study and!97% of cases and controls in the Los Angeles study.

Supplementary Table 2: Association of rs6983267 with colorectal cancer risk. Population (cases/controls) TT GT GG Effect per G allele Pvalue African Americans Ref.. (.363.39).57 (.524.7).37 (.98.9) (27 /,49) 3% 25% 72%.65 Japanese Americans Ref.. (.9.36).49 (.2.2).9 (.3.37) (592 /,564) 46% 43% %.6 Native Hawaiians Ref..25 (.652.4) 2.95 (.97.29).59 (.22.47) (6 / 347) 48% 43% 9%.39 Latinos Ref..3 (.67.59).45 (.942.25).26 (.2.55) (25 /,7) 6% 49% 35%.32 European Americans Ref..92 (.73.7).32 (..72).6 (.2.33) (686 /,544) 25% 5% 24%.29 All populations (,87 / 5,5) Ref..4 (.9.2).47 (.25.74).22 (.2.32) 4.4x 6 Males (87 / 2,7) Ref.. (.9.35).5 (.8.9).22 (.8.38).2x 3 Females (99 / 2,8) Ref..98 (.82.22).43(.4.79).2(.7.35).7x 3 Localized disease (92 / 5,5) Ref..3 (.94.36).53 (.23.9).24 (..38).3x 4 Regional or Distant disease (874 / 5,5) Ref..95 (.78.5).42 (.4.76).2 (.7.34).6x 3 Colon (,359 / 5,5) Ref..3 (.88.2).48 (.24.78).22 (..34) 2.4x 5 Rectum (448 / 5,5) Ref..8 (.84.37).44 (.8.93).2 (.3.39).6 Age " 67 years (median) (85 / 2,88) Ref..9 (.89.34).32 (.4.68).5 (.2.3).24 Age > 67 years (957 / 2,63) Ref.. (.83.23).65 (.322.7).28 (.4.44) 2.5x 5 Family History of Colorectal Cancer in a FirstDegree Relative Ref..93 (.62.38).94 (.93.4).36 (.7.74).3 (24 / 497) No Family History of Colorectal Cancer in a FirstDegree Relative (,397 / 4,45) Ref..4 (.89.22).4 (.7.7).9 (.8.3) 2.9x 4 All the samples from the study are combined for this analysis. Each cell in the table gives odds ratios (and 95% confidence intervals) for allele dosage effects estimated using unconditional logistic regression, and frequency of each genotype class in controls for each ethnic group. ORs are adjusted for gender in ethnicspecific analysis, ethnicity in genderspecific analysis and both gender and ethnicity in pooled and stratified analyses. ORs estimated including the African Americans, Native Hawaiians and Latinos are also adjusted for genomewide European ancestry. Genotyping was successful for 97.5% of cases and 94.8% of controls. Pvalue testing for heterogeneity of effects among cases and controls: ethnicity (P=.63), gender (P=.82), age (P=.26) and family history (P=.49). Information about first degree family history of colon cancer was available for 9% of cases and 9% controls. P value testing for heterogeneity of effects in caseonly analyses: localized vs regional/distant disease (P=.7), and colon vs rectum (P=.74).

Supplementary Table 3. Allele frequencies and associations of 82 SNPs in Region 3 (28.4728.54 Mb) with colorectal cancer risk in the MEC populations (,88 cases and,823 controls). Cases/Controls: African Americans, 25/4; Japanese Americans, 374/4; Native Hawaiians, 6/239; Latinos, 243/426; European Whites, 26/337. Allele Frequency in Controls b Chi Square Pvalue Pvalue Pvalue SNP ID a Position Call Rate Pooled Pooled Adj. for rs6983267 Adj. for rs88556 rs55479 284797.985.22...6.5.42.4.33.6.44.58.54 rs56287 2847954.986.88.455.7.298.86.62.69.6.3.28..96.26.42 rs55478 2847736.995.56.....7.48.49.47.38 rs2549845 2847289.99.37.63.82.247.254.3.2.9.4.34.36.55.4.6 rs44525 2847235.968.99.32.6.262.37.63 3.22.44.23.5.3.87.37.93 rs7844673 28472696.974.6.76.85.34.72.2.9.36.4.5..94.94.79 rs956365 2847369.988.74.5.27.7.56.26.48.2 2.37.4.3.86.62.48 rs783548 2847397.98.228..2.7.5..2.3.4.4.84.55.63 rs69247 28474254.988.229..4.26.5.23..35.86.22..92.88.84 rs55477 28476625.976.29.682.72.46.55 7.39.94 3.73 3.7.98 4.38.5x4.8.73 rs233437 28477246.974.84.76.2.2.37.2 7.88 2.3.54.3 3.67.6.78.69 rs55476 28477298.99.634.84.44.79.237 4..7 2.9.86.26 8.37 3.8x3.47.45 rs985829 2847844.982.29.8.27.272.28.89.7 3.63.29.2 5.4 8.6x5.6.46 rs88555 28478693.978.346.92.22.286.297.85 9.3 3.86.52.7 5.6.x4.7.45 rs2682374 28483.99.74.682.76.48.55 4.4 2.3 5.59 3.86.44 4.8.7x4.32.77 rs55475 2848639.978.94.6.3.23.52.5.48.2.87.5.4.84.26.6 rs88556 28482329.978.668.32.247.33.39 4.69 3.96 4.68 8.28.6 22.44 2.2x6 7.4x3 rs6983267 28482487.976.44.687.72.43.52.24 3.29 5.74 4.44.2 3.46 2.4x4.86 rs73278 2848474.979.529.96.22.39.338 4.4 8.2 4.28 4.48.3 2.9 4.8x6 9.9x3.25 rs7439 28484533.986.6...6..9.22.5.48.6.72 rs55474 28486686.977.775.32.25.343.47 2.7 3.7 4.38 5.6.38 5.9 9.7x5.3.8 rs32734 28488376.987.37....2 2.3.3 2.24.3.4.22 rs98696 28488689.994..8.32.4.2.92.5.42.3.3.2.88.6.42 rs26776 28489299.986.74.32.252.332.392 4.4 3.54 4.28 6.32.2 9.8.2x5 2.x2.3 rs3248944 2848974.986.5.6.3.6.5.6.5..77.3.35.56.3 3.5x2 rs487788 2849967.973.486.299.243.37.39.92 3.55 4.77 4.77.7 5..x4.9.37 rs956369 28492999.979.644.297.25.323.39 2.6 4.36 4.35 5.46.49 6.3 5.4x5.8.34 rs74346 28493974.99.4.92.29.3.335 2.24 8.96 4.2 4.79.78 9.3.3x5.6x2.3 Broad938345 28496959.992.52...7.2.7 2.35.2.34.25.29.48 rs487789 28497243.967.734.33.288.384.483 2.22 3.59 3.64 3.68.47 5.46 8.4x5.5.56 Broad9384 2849793.982.34....2.89.2.3.83.36.23.62 rs64759 284987.955.47.224.76.267.37.9.7 2.28 7.2.3 7.4 6.5x3.6.85 rs6475 28498842.989.28.2.8.84.22.59.64.86 9.3.6.25 8.x4.5.26 rs7842552 285876.987.382.8.86.98.26 2.85 6.93.24..57.76.38.47.25 rs23753 285388.972.276.225.96.27.273.6.6.59.8.75.3.85.39. rs9297756 2859349.98.89.34.53.29.28.2 3.56.3.92.82.44.5.55.37 rs6995633 2859833.982.27.68.87.222.76.33.5.6.23.95..92.23.59 rs6999789 28543.983.87.4.24.79.84.7 2.5.5.9.37.86.7.49.6 rs7448 285352.978.627.25.222.299.363.4.2.5.24.3.95.33.64.79 rs6982665 28543.973.378.233.25.29.269.94.3..3.4..92.26.8 rs7357486 28585.972.3.233.22.26.275.8.3.28 2.32.47.5.69.5 3.9x2 Broad93855 285857.98.5.....5 2.2.3.86.99.9 Broad938559 285226.984.254.9.9.9.84 2.62 2.2.57.68.65 4.35 3.7x2.24.3x2 rs5632 2858885.97.345.2.228.288.359.66.36.85.9.4 3..8.34.22 rs328578 2859269.996.52..2.7.2.67...37.3.79.8.3.36 rs64752 285994.963.324.2.226.276.355.66.3.2.8.94 3.4.6.28.2 Broad938876 2852324.98.22..2.4...7.2..2.88.95.85 rs698397 28524836.977.46.74.62.374.39.76.9.33.28.39.36.24.75.83 Broad93893 2852492.989.54..2.5. 2.97.2.54 4.92 2.7x2 3.x3 3.9x2 rs74 285252.97.573.26.24.298.363..98.75. 2.79 5.32 2.x2.23.4 rs2862368 2852553.973.643.28.244.298.373.7.9.28..25 4.34 3.7x2.3.2 rs72462 28526872.982.288.264.289.438.44.2 2.54.2.85...92.35.33 Broad9396 28526977.969.74.6.33.6.72.2.32.7 5.92.2 3.74.5.2 6.8x3 Broad9395 28527247.975.38.6.82.63.26.98.4.4.78. 3.86.5..8x2 rs9622 28527333.977.54.2.8.8.29.92.47..78.94 3.52.6.2.5 rs6996874 2852749.986.438.2.82.6.228.9.74.3 2.7.8 3.87 4.9x2.8.4x2 Broad9398 2852852.993.62...2.2.8.4.3.4.52.63.45 rs64757 28529586.976.73..48.99.63.3.2.5.6.28.2.65.46.67 rs784228 28536.99.74.83.29.3.264.62.28.2.37.22.2.29.7.27 rs9459 2853789.976.344.77.35.36.268.23.86.4.3...94.8.95 rs79294 285378.99.222.8.5.34.34.24.5.57.24.22.25.62.33.36 Broad939263 2853943.979.9.9.3.74.75.5 2.5.23.52.43.7.4.74.2 Broad93935 2853383.973.424.258.34.54.558.6 2.5.5.7.2..32.53.33 Broad939378 285343.929.455.59.554.322.78 2.6.28.5.3.2..95.59.67 rs2845337 28534369.98.95.48.2.68.267.52.77.7.8.4.8.67.48.28 rs964322 28534669.983.462.593.535.329.77.56.59.33.83..3.87.6.65 rs7836345 28534862.973.5.628.553.388.273.99.9.94.86.7.5.82.4.58 Broad939524 2853573.982.52...2.2.68...52.47.22.24 Broad939636 28535734.98.67...4..5.7.3.3.8.78.93.95 rs783 2853586.97.247.65.85.399.45..4.8.9.85..96.75.77 rs784264 28535996.976.69.595.542.298.65.94.5..26.29.7.68.7.83 rs7767 28536936.99.53.437.32.534.536 2.6 2.4.63.65.58.78.8.7.6 rs9995 285376.989.243.28.8.28.225.96. 2.6. 2.26.3.3.7.25 rs79 28537265.98.28....3.6.3.6.28.6.57.5 Broad939763 28537526.979.29.4.48.72.9.68.7.74..2.2.27.3.4 rs399977 2853868.974.282.3.49.4.237.4.52.78.7.9.5.7.54.74 rs69948 2853889.979.92.3.95.79.89.88 4.26.9.8 6.33.74.39.57.6 rs956372 28539438.975.758.382.48.35.35.7.42.9.3.5..9.89.77 Broad939966 28539443.993.76..2.5.2...7.3.3.87.9.9 Broad939976 28539576.984.43...4..34.4.23.27.26.35 Broad93998 2853973.97.43...2..2.3.22.64.68.78 rs7825928 2853979.976.2.53.559.599.628.27.69.83.4.23.8.67.97.99 a SNPs labeled "BroadXXXXXXXX" were identified by resequencing as described in ref. 2. b Minor allele based on all populations combined;, African Americans;, Japanese Americans;, Native Hawaiians;, Latinos;, European Whites.

Supplementary Table 4: Association of rs88556 and rs73278 with colorectal cancer risk in the MEC (,88 cases,,823 controls). Marker / Position (Mb) African Americans a (25 cases / 4 controls) Japanese Americans (374 cases / 4 controls) Native Hawaiians a (6 cases / 239 controls) Latinos a (243 cases / 426 controls) European Americans (26 cases / 337 controls) Colorectal Cancer Pooled OR b (95% CI) rs88556 28,482,329.37 (.3.82) 67%.23 (..5) 3%.63 (.32.57) 25%.44 (.4.84) 33%.8 (.9.53) 39%.32 (.8.49) P=2.2x 6 rs73278 28,484,74.32 (.2.72).4 (..77).62 (.22.58).33 (.4.69).6 (.89.5).32 (.7.49) 53% 2% 22% 3% 34% P=4.8x 6 Each cell gives odds ratios (and 95% confidence intervals) for allele dosage effects along with the risk allele frequency in controls. All ORs are adjusted for gender. a OR also adjusted for genomewide European ancestry (African Americans, Native Hawaiians and Latinos). b OR adjusted for population and genomewide European ancestry (African Americans, Native Hawaiians and Latinos). Pvalue testing for heterogeneity of effects: rs88556, P=.53; rs73278, P=.73. SNPs rs88556 and rs73278 were genotyped successfully for!98% of cases and!97% of controls.

Supplementary Table 5: Linkage disequilibrium between the 9 risk variants evaluated in this study at 8q24 (r 2 upper, D lower diagonal) Polymorphism (Position) Region Population rs3254738 rs698356 Broad93495 rs32865 rs88556 rs6983267 rs73278 rs7448 rs954 rs3254738 (28,73,525) Region 2 rs698356 (28,76,62) Region 2 Broad93495 (28,2,99) Region 2 rs32865 (28,424,8) (breast cancer) rs88556 (28,482,329) Region 3 rs6983267 (28,482,487) Region 3 rs73278 (28,484,74) Region 3 rs7448 (28,5,352) Region 3 rs954 (28,6,39) Region.6..87.4.58..3.5.5.8.6.9.3.24..7.3.2.9.2.6.5.26.9.6.9.2..8.5.2.8.3.3.44.9....3.33.5.6.2.5.2.43.9.69.7.8.6.9.3.4.29.4.8.3.49..5.9.5.8.2.4.42.3.3.57.26.5.2..6.8.25.3.5.6.25.9.9.6.4.24..4.3.5.7.3.3.8.5.3..3.2...4.2.99.99..99.99......6.6.39.24.3.25.4.35.4.2.2..5.5.33.94.83.34.63.98...99..66.62.55.48.45.67..38.2.2..3.55.6.84.87.8.8.57.7.3.5.9.22.27.5.3.6.9.34.3..2..2.6.2.25.2.4.2..25.2.6..4.7.2..2.... r 2 and D values estimated among cases and controls combined using the Haploview program (http://www.broad.mit.edu/mpg/haploview/). LD between the 4 variants in the region finemapped (28.47 28.54 Mb) are shown with a heavy black line., African American;, Japanese American;, Native Hawaiian;, Latino;, European American.6.24.2.32.77.3