A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer

Size: px
Start display at page:

Download "A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer"

Transcription

1 Supplementary Information A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer Julius Gudmundsson, Patrick Sulem, Daniel F. Gudbjartsson, Gisli Masson, Bjarni A. Agnarsson, Kristrun R. Benediktsdottir, Asgeir Sigurdsson, Olafur Th. Magnusson, Sigurjon A. Gudjonsson, Droplaug N. Magnusdottir, Hrefna Johannsdottir, Hafdis Th. Helgadottir, Simon N. Stacey, Adalbjorg Jonasdottir, Stefania B. Olafsdottir, Gudmar Thorleifsson, Jon G. Jonasson, Laufey Tryggvadottir, Sebastian Navarrete, Fernando Fuertes, Brian T. Helfand, Qiaoyan Hu, Irma E. Csiki, Ioan N. Mates, Viorel Jinga, Katja K. H. Aben, Inge M. van Oort, Sita H. Vermeulen, Jenny L. Donovan, Freddy C. Hamdy, Chi-Fai Ng, Peter K.F. Chiu, Kin-Mang Lau, Maggie C.Y. Ng, Jeffrey R. Gulcher, Augustine Kong, William J. Catalona, Jose I. Mayordomo, Gudmundur V. Einarsson, Rosa B. Barkardottir, Eirikur Jonsson, Dana Mates, David E. Neal, Lambertus A. Kiemeney, Unnur Thorsteinsdottir, Thorunn Rafnar, Kari Stefansson. Content: Supplementary Figures 1-2 Supplementary Tables 1-5 Supplementary Note 1

2 Supplementary Figure 1. Functional features from the ENCODE project and variants tagging the new 8q24 association signal. Shown at the top are the five variants tagging the new association signal and discussed in the main text and listed in Supplementary Table 1b along with ENCODE features. The interval shown is on chromosome 8q24 between Mb and Mb (Build 36). The figure was adapted from the USCS Genome Browser ( 2

3 Supplementary Figure 2. The three pedigrees with the five homozygous carriers of the 8q24 risk allele (rs [a]). Squares indicate males and circles females, and filled symbols indicate prostate cancer. Also shown is the year of birth and death, as well as the age at diagnosis. A slash through the symbol indicates that the subject is deceased. The generations in each pedigree are numbered from top to bottom with Roman numerals and individuals in each generation are given Arabic numerals, counting from left to right. The carrier status of subject from whom DNA samples are available is given in the table following each pedigree (Dir = directly genotyped), or when inferred to be obligate carriers (Ob). Non-carriers are indicated with a minus (-) sign, heterozygous carriers are indicated by a plus (+) sign, and homozygous carriers are pointed out with an arrow and marked with a hash (#) sign. 3

4 Supplementary Figure 2 continued. Squares indicate males and circles females, and filled symbols indicate prostate cancer. Also shown is the year of birth and death, as well as the age at diagnosis. A slash through the symbol indicates that the subject is deceased. The generations in each pedigree are numbered from top to bottom with Roman numerals and individuals in each generation are given Arabic numerals, counting from left to right. The carrier status of subject from whom DNA samples are available is given in the table following each pedigree (Dir = directly genotyped), or when inferred to be obligate carriers (Ob). Non-carriers are indicated with a minus (-) sign, heterozygous carriers are indicated by a plus (+) sign, and homozygous carriers are pointed out with an arrow and marked with a hash (#) sign. 4

5 Supplementary Figure 2 continued. Squares indicate males and circles females, and filled symbols indicate prostate cancer. Also shown is the year of birth and death, as well as the age at diagnosis. A slash through the symbol indicates that the subject is deceased. The generations in each pedigree are numbered from top to bottom with Roman numerals and individuals in each generation are given Arabic numerals, counting from left to right. The carrier status of subject from whom DNA samples are available is given in the table following each pedigree (Dir = directly genotyped), or when inferred to be obligate carriers (Ob). Non-carriers are indicated with a minus (-) sign, heterozygous carriers are indicated by a plus (+) sign, and homozygous carriers are pointed out with an arrow and marked with a hash (#) sign. 5

6 Supplementary Table 1a. Previously published cancer risk SNPs on 8q24 as well as correlation (r 2 ) and linkage disequilibrium (D') between them and rs Variant Position_B36 (bp) r 2 D' Cancer site/type and Reference rs Prostate 1 rs Prostate 1,2 rs Prostate 1 rs Prostate 1 rs Prostate 3 rs CLL 4 rs Prostate 5 rs (rs620861) Prostate 1,5,6 rs Breast 7 rs Prostate 8 / colorectal cancer 9-11 rs Prostate 12 rs < na Urinary bladder 13 Shown are results based on whole-genome sequence data from 1,795 Icelanders. The correlation (r 2 ) and linkage disequilibrium (D') shown in the table is between the variants listed in column one and rs CLL stands from chronic lymphocytic leukemia. Supplementary Table 1b. Correlation (r 2 ) and linkage disequilibrium (D') between rs and other variants significant after steps 1 and 2 of the logistic regression Variant Position_B36 (bp) r 2 D' Risk allele Other allele Control Freq. of risk allele rs A T rs G A INDEL # G GATAA rs G A rs A G Shown are results based on whole-genome sequence data from 1,795 Icelanders. # The INDEL is located on 8q24 between bp and bp in Build36/hg18. 6

7 Supplementary Table 2. Stepwise logistic regression association results in Iceland for previously published prostate cancer risk SNPs located on 8q24 and the 5 variants tagging the new association signal Adjusted results Adjusted results Adjusted results step-3 Unadjusted step-2 step-1 (rs ) SNP Position B36 (bp) results (rs ) (rs ) (rs ) (rs ) (rs ) OR P-value OR P-value OR P-value OR P-value rs E rs * E E E-11 na na rs E rs E E rs E E E rs * E E E-14 na na rs E E-12 na na na na rs * E E E-13 na na rs * E E E-13 na na INDEL* # E E E-10 na na rs E E E rs E E E E-04 rs E E E E-03 rs E-19 na na na na na na Shown are unadjusted and adjusted imputed association results for 2,315 Icelandic patients and 54,444 Icelandic controls genotyped using the commercial Illumina SNP chips. In step-1 of the logistic regression, results were adjusted for the most significant SNP in our GWAS; rs located in region-1 (r 2 = 0.95 between rs and rs ). In step-2 we conditioned on two SNPs, including the SNP used in step-1 as well as the SNP most significant in step-1, which is rs located in region-2 (r2 = 0.64 between rs and rs , which is the SNP initially reported in region-2 ), and in step-3 we added rs to the list of SNPs conditioned on. *Variants tagging the new association signal on 8q24 discussed in the main text. # The INDEL (located on 8q24 between bp. and bp. in Build36/hg18) has the major allele = GATAA and the minor allele = G (minor allele control freq. in Iceland = 1.2%). Na denotes that the variants is being included in the logistic regression, hence, it is not applicable to show results for that variant and a particular step of the regression analysis. 7

8 Supplementary Table 3. Summary association results for rs [g] on 8q24 and prostate cancer Frequency Controls Study population Cases (n) (n) Cases Controls OR (95% CI) P value Iceland 4,537 54, (2.22, 3.27) 1.50E-23 Chicago (1.04, 2.81) Spain (0.51, 5.68) 0.38 The Netherlands (1.60, 3.82) 4.60E-05 Romania (1.12, 5.19) UK (2.29, 7.32) 2.00E-06 All excl. Iceland 5,399 7, (1.87, 3.14) 3.10E-11 All combined 9,936 61, (2.22, 3.03) 4.10E-33 P het = 0.33 I 2 = 13.3 All P values shown are two-sided. Shown are the corresponding numbers of cases and controls (n), allelic frequencies of variants in affected and control individuals, the allelic odds-ratio (OR) with 95% confidence interval (95% CI) and P value. Also shown are the P-values for the heterogeneity of the ORs (P het ) for all study groups as well as I 2 which lies between 0% and 100% and describes the proportion of total variation in study estimates that is due to heterogeneity. For the combined study populations, the reported control frequency was the average, unweighted control frequency of the individual populations, while the OR and the P value were estimated using the Mantel-Haenszel model. Of the Icelandic cases, 2,315 patients had been genotyped using one of the Illumina chips, and 2,222 additional patients had at least partial data based on family based imputation. Of the Icelandic controls 27,780 were imputed based on chip-genotypes and 26,664 were family based imputed. All non-icelandic replication samples are directly genotyped. 8

9 Supplementary Table 4a. Association results for aggressive- and non-aggressive prostate cancer for rs [a] on 8q24. Aggr. Disease Non-Aggr. Disease Study group P-value OR (95% CI) Cases (n) Freq. Cases (n) Freq. Iceland (0.97, 2.15) Chicago (0.72, 2.35) The Netherlands (0.55, 2.65) Romania 0.61 inf (0.0, inf) Spain 0.14 inf (0.2, inf) UK (0.22, 1.71) All combined (0.97, 1.73) 4,583-4,544 - Shown are association results for the group of patients with a more aggressive phenotype (Gleason 7 and/or T3 or higher and/or node positive and/or metastatic disease) and the group of patients with less aggressive tumors (Gleason <7 and T2 or lower). Supplementary Table 4b. Association results for aggressive- and non-aggressive prostate cancer for rs [t] in HOXB13 on 17q21-22 Aggr. Disease Non-Aggr. Disease Study group P-value OR (95% CI) Cases (n) Freq. Cases (n) Freq. Iceland (0.31, 10.05) Chicago (0.26, 5.40) The Netherlands (0.27, 3.01) Romania (0.01, 3.34) Spain* na na na na na na UK (0.31, 4.79) All combined (0.52, 2.10) 3,300-2,943 - Shown are association results for the group of patients with a more aggressive phenotype (Gleason 7 and/or T3 or higher and/or node positive and/or metastatic disease) and the group of patients with less aggressive tumors (Gleason <7 and T2 or lower). * For the Spanish study group since the carriers of HOXB13 mutation did not have the relevant clinical information available hence a comparison analysis was not possible (na). 9

10 Supplementary Table 5a. Summary results for age at diagnosis for rs [a] Study group Effect (95% CI) P value Iceland (-2.63, 0.05) Chicago (-2.17, 2.05) 0.96 The Netherlands (-2.51, 1.35) 0.56 Romania (-9.43, 1.07) 0.12 Spain (-9.78, 3.32) 0.33 UK (-4.79, -0.16) All combined (-2.15, -0.36) P het = 0.54 I 2 = 0.0 The effect is in years per risk allele carried, and the minus (-) sign denotes younger age at prostate cancer diagnois. Supplementary Table 5b. Summary results for age at diagnosis for rs [t] Study group Effect (95% CI) P value Iceland (-9.07, -0.49) Chicago (-4.87, 1.52) 0.30 The Netherlands (-2.15, 1.97) 0.93 Romania -0.6 (-11.67, 10.47) 0.92 Spain na na UK (-4.23, 1.55) 0.36 All combined (-2.53, 0.21) P het = 0.41 I 2 = 0.0 The effect is in years per risk allele carried, and the minus (-) sign denotes younger age at prostate cancer diagnois. 10

11 Supplementary Note Study populations Icelandic study population. The ICR contains 5,141 Icelandic prostate cancer patients diagnosed from January 1, 1955, to December 31, The Icelandic prostate cancer sample collection included 2,315 patients (diagnosed from December 1974 to December 2008) who were recruited from November 2000 until June A total of 4,537 patients were included in the study of which 2,315 had genotypes from a genome wide SNP genotyping effort, using the Infinium II assay method and the Sentrix HumanHap300 BeadChip (Illumina, San Diego, CA, USA), and 2,222 had imputed genotypes based on genotypic information from first or second degree relatives that have been chip-genotyped. Of the 4,537 prostate cancer patients 53 were among the 1,795 individuals who had been whole genome sequenced. The mean age at diagnosis is 71 years for prostate cancer patients in the ICR. In the present study, for all populations, aggressive prostate cancer is defined as: Gleason 7 and/or T3 or higher and/or node positive and/or metastatic disease, while the less aggressive disease is defined as Gleason <7 and T2 or lower. The 54,444 male controls (27,780 had variants imputed based on chip-genotypes and 26,664 had variants imputed with a family based methods) are comprised of individuals recruited through different genetic research projects at decode. The controls have been diagnosed with common diseases of the cardio-vascular system (e.g. stroke or myocardial infraction), psychiatric and neurological diseases (e.g. schizophrenia, bipolar disorder), endocrine and autoimmune system (e.g. type 2 diabetes, asthma), malignant diseases other than prostate cancer as well as individuals randomly selected from the Icelandic genealogical database. The controls had a mean age of 84 11

12 years and the range was from 8 to 105 years. The controls were absent from the nation-wide list of prostate cancer patients according to the ICR. The study was approved by the Data Protection Commission of Iceland and the National Bioethics Committee of Iceland. Written informed consent was obtained from all patients and controls. Personal identifiers associated with medical information and blood samples were encrypted with a third-party encryption system as previously described 14. The UK In the Prostate Testing for Cancer and Treatment trial (ProtecT), men aged years were contacted and provided with information about the uncertainty surrounding PSA testing, detection and radical treatment of early prostate cancer, and offered an appointment for counseling and PSA testing. Recruitment took place at nine sites in the UK; 94,427 men agreed to be tested (50% of men contacted) and 8,807 (~9%) had a raised PSA level. Of those with raised PSA levels, 2,022 (23%) were diagnosed with prostate cancer; 229 men (~ 12%) had locally advanced (T3 or T4) or metastatic cancers, the rest having clinically localized (T1c or T2) disease. Men with a PSA level of 20 ng/ ml were excluded from the trial. Those with locally confined cancers (mostly T1c, but some T2a and T2b) and with PSA levels of < 20 ng/ml were offered randomization into a three-arm trial of treatment (random assignment between active monitoring, radical prostatectomy or radical radiotherapy). Participants will be followed up for 10 years. Study participants found to have locally advanced ( T3) or distantly advanced disease were not eligible for the ProtecT treatment trial, and were referred for routine UK National Health Service care. Ethical approval for the ProtecT study was obtained from Trent Multi- Centre Research Ethics Committee. 12

13 From the ProtecT trial study group, the following number of samples were selected for the present study: 547 patients with PSA values >3 ng/ml and diagnosed with prostate cancer after undergoing a needle biopsy (average age at diagnosis is 63.0 years); as controls : 1,160 men with PSA values between 3 ng/ml and 10 ng/ml but not diagnosed with prostate cancer after undergoing a needle biopsy (average age at PSA measurement is 62.4 years), and 675 men with PSA values < 3 ng/ml (average age at PSA measurement is 62.7 years). The Netherlands The total number of Dutch prostate cancer cases used in this study was 1,545. The Dutch study population consisted of two recruitment-sets of prostate cancer cases; Group-A was comprised of 360 hospital-based cases recruited from January 1999 to June 2006 at the Urology Outpatient Clinic of the Radboud University Nijmegen Medical Centre (RUNMC); Group-B consisted of 707 cases recruited from June 2006 to December 2006 through a population-based cancer registry held by the Comprehensive Cancer Centre IKO. Both groups were of self-reported European descent. The average age at diagnosis for patients in Group-A was 63 years (median 63 years; range 43 to 83 years). The average age at diagnosis for patients in Group-B was 65 years (median 66 years; range 43 to 75 years). The 1,960 control individuals were cancer free and were matched for age with the cases. They were recruited within a project entitled The Nijmegen Biomedical Study, in the Netherlands. This is a population-based survey conducted by the Department of Epidemiology and Biostatistics and the Department of Clinical Chemistry of RUNMC, in which 9,371 individuals participated from a total of 22,500 age and sex stratified, randomly selected inhabitants of Nijmegen. Control individuals from the Nijmegen Biomedical Study were invited to participate in a study on gene-environment interactions in multifactorial diseases, such as cancer. All the Dutch participants in the present study are of self-reported 13

14 European descent and were fully informed about the goals and the procedures of the study. The study protocol was approved by the Institutional Review Board of Radboud University and all study subjects gave written informed consent. Spain The Spanish study population used in this study consisted of 735 prostate cancer cases. The cases were recruited from the Oncology Department of Zaragoza Hospital in Zaragoza, Spain, from June 2005 to September All patients were of self-reported European descent. Clinical information including age at onset, grade and stage was obtained from medical records. The average age at diagnosis for the patients was 69 years (median 70 years) and the range was from 44 to 83 years. The 1,635 Spanish control individuals were approached at the University Hospital in Zaragoza, and the men were prostate cancer free at the time of recruitment. Study protocols were approved by the Institutional Review Board of Zaragoza University Hospital. All subjects gave written informed consent. Chicago The Chicago study population used consisted of 1,956 prostate cancer cases. The cases were recruited from the Pathology Core of Northwestern University s Prostate Cancer Specialized Program of Research Excellence (SPORE) from May 2002 to May The average age at diagnosis for the patients was 60 years (median 59 years) and the range was from 39 to 87 years. The 1,272 European American controls and the 467 African American controls were recruited as healthy control subjects for genetic studies at the University of Chicago and Northwestern University Medical School, Chicago, US. All patients from Chicago included in this report were of self-reported European descent. The controls were either of self-reported European descent or 14

15 of African descent. Study protocols were approved by the Institutional Review Boards of Northwestern University and the University of Chicago. All subjects gave written informed consent. Romania The Romanian study population used in this study consisted of 738 prostate cancer cases. The cases were recruited from the Urology Clinic Theodor Burghele of The University of Medicine and Pharmacy Carol Davila Bucharest, Romania, from May 2008 to November All patients were of self-reported European descent. Clinical information including age at onset, grade and stage were obtained from medical records at the hospital. The average age at diagnosis for the cases was 70 years (median 71 years) and the range was from 46 to 89 years. The 932 Romanian controls were recruited at the General Surgery Clinic St. Mary and at the Urology Clinic Theodor Burghele of The University of Medicine and Pharmacy Carol Davila Bucharest, Romania. The average age for controls was 60 years (median 62 years) with a range from 19 to 87 years. The controls were cancer free at the time of recruitment. Study protocols were approved by the National Ethical Board of the Romanian Medical Doctors Association in Romania. All subjects gave written informed consent. Hong Kong The Hong Kong study population used in this study consisted of 498 prostate cancer cases. The cases were recruited from the Division of Urology, Department of Surgery, Prince of Wales Hospital in Hong Kong, China, from October 2007 to June All patients were of selfreported Chinese descent. Clinical information including age at onset, grade and stage was obtained from medical records. The average age at diagnosis for the patients was 70.3 years 15

16 (median 71 years) and the range was from 46 to 92 years. Study protocol was approved by the joint ethics committee of The Chinese University of Hong Kong and New Territories East Cluster Clinical Research. All subjects gave written informed consent. Sequencing, Genotyping and Statistical Methods Whole-genome sequencing and SNP imputations: Of the 1,795 individuals whole-genome sequence and used in the current study, 53 have been diagnosed with prostate cancer according to the nationwide list maintained by the Icelandic Cancer Registry. The key steps of the wholegenome sequencing and imputation are as follows: 1. Sample preparation. Paired-end libraries for sequencing were prepared according to the manufacturer's instructions (Illumina). In short, approximately 5 μg of genomic DNA, isolated from frozen blood samples, were fragmented to a mean target size of 300 bp using a Covaris E210 instrument. The resulting fragmented DNA was end repaired using T4 and Klenow polymerases and T4 polynucleotide kinase with 10 mm dntp followed by addition of an 'A' base at the ends using Klenow exo fragment (3 to 5 -exo minus) and datp (1 mm). Sequencing adaptors containing 'T' overhangs were ligated to the DNA products followed by agarose (2%) gel electrophoresis. Fragments of about 400 bp were isolated from the gels (QIAGEN Gel Extraction Kit), and the adaptor-modified DNA fragments were PCR enriched for ten cycles using Phusion DNA polymerase (Finnzymes Oy) and PCR primers PE 1.0 and PE 2.0 (Illumina). Enriched libraries were further purified using agarose (2%) gel electrophoresis as described above. The quality and concentration of the libraries were assessed with the Agilent 2100 Bioanalyzer using the DNA 1000 LabChip (Agilent). Barcoded libraries were stored at 20 C. 16

17 All steps in the workflow were monitored using an in-house laboratory information management system with barcode tracking of all samples and reagents. 2. DNA sequencing. Template DNA fragments were hybridized to the surface of flow cells (Illumina PE flowcell, v4) and amplified to form clusters using the Illumina cbot. In brief, DNA (3 10 pm) was denatured, followed by hybridization to grafted adaptors on the flowcell. Isothermal bridge amplification using Phusion polymerase was then followed by linearization of the bridged DNA, denaturation, blocking of 3 ends and hybridization of the sequencing primer. Sequencing-by-synthesis was performed on Illumina GAIIx instruments equipped with pairedend modules. Paired-end libraries for whole-genome sequencing were sequenced using either or cycles of incorporation and imaging with Illumina sequencing kits, v4 or v5 (TruSeq). Each library or sample was initially run on a single lane for validation followed by further sequencing of 4 lanes with targeted raw cluster densities of k/mm2, depending on the version of the data imaging and analysis packages. Imaging and analysis of the data was performed using SCS2.6 /RTA1.6, SCS2.8/RTA1.8 or SCS2.9&RTA1.9 software packages from Illumina, respectively. Real-time analysis involved conversion of image data to base-calling in real-time. 3. Alignment. Reads were aligned to NCBI Build 36 of the human reference sequence using Burrows-Wheeler Aligner (BWA) (ref. 15 ). Alignments were merged into a single BAM file and marked for duplicates using Picard 1.55 ( Only non-duplicate reads were used for the downstream analyses. 4. SNP calling and genotyping in whole-genome sequencing. Variants were called using Genome Analysis Toolkit, (GenomeAnalysisTK) g0acaf2d (ref. 16 ), by applying base 17

18 quality score recalibration, indel realignment and performing SNP and INDEL discovery and genotyping using standard hard filtering Variants were annotated using SNPeff and Genome Analysis Toolkit g1f1233b with only the highest-impact effect The allele frequency used for filtering was based on phased genotypes of 32,496,467 SNPs and INDELs from the 1,795 whole-genome sequenced Icelanders. Long range phasing. Long range phasing of all chip-genotyped individuals was performed with methods described previously In brief, phasing is achieved using an iterative algorithm which phases a single proband at a time given the available phasing information about everyone else that shares a long haplotype identically by state with the proband. Given the large fraction of the Icelandic population that has been chip-typed, accurate long range phasing is available genome-wide for all chip-typed Icelanders. Genotype imputation. We imputed the SNPs identified and genotyped through sequencing into all Icelanders who had been phased with long range phasing using the same model as used by IMPUTE 23. The genotype data from sequencing can be ambiguous due to low sequencing coverage. In order to phase the sequencing genotypes, an iterative algorithm was applied for each SNP with alleles 0 and 1. We let H be the long range phased haplotypes of the sequenced individuals and applied the following algorithm: 1. For each haplotype h in H, use the Hidden Markov Model of IMPUTE to calculate for every other k in H, the likelihood, denoted γ h,k, of h having the same ancestral source as k at the SNP. For every h in H, initialize the parameter θ h, which specifies how likely the one allele of the SNP is to occur on the background of h from the genotype likelihoods 18

19 obtained from sequencing. The genotype likelihood L g is the probability of the observed sequencing data at the SNP for a given individual assuming g is the true genotype at the SNP. If L 0, L 1 and L 2 are the likelihoods of the genotypes 0, 1 and 2 in the individual that carries h, then set. 2. For every pair of haplotypes h and k in H that are carried by the same individual, use the other haplotypes in H to predict the genotype of the SNP on the backgrounds of h and k:, and,. Combining these predictions with the genotype likelihoods from sequencing gives un-normalized updated phased genotype probabilities: 1 1, 1, 1 and. 3. Now use these values to update θ h and θ k to and. 4. Repeat step 3 when the maximum difference between iterations is greater than a convergence threshold ε. We used ε=10 7. Given the long range phased haplotypes and θ, the allele of the SNP on a new haplotype h not in H, is imputed as,. The above algorithm can easily be extended to handle simple family structures such as parentoffspring pairs and triads by letting the P distribution run over all founder haplotypes in the family structure. The algorithm also extends trivially to the X-chromosome. If source genotype data are only ambiguous in phase, such as chip genotype data, then the algorithm is still applied, but all but one of the Ls will be 0. In some instances, the reference set was intentionally enriched 19

20 for carriers of the minor allele of a rare SNP in order to improve imputation accuracy. In this case, expected allele counts will be biased toward the minor allele of the SNP. Call the enrichment of the minor allele E and let θ be the expected minor allele count calculated from the naïve imputation method, and let θ be the unbiased expected allele count, then hence. and This adjustment was applied to all imputations based on enriched imputations sets. We note that if θ is 0 or 1, then θ will also be 0 or 1, respectively. In-silico genotyping. In addition to imputing sequence variants from the whole genome sequencing effort into chip genotyped individuals, we also performed a second imputation step where genotypes were imputed into relatives of chip genotyped individuals, creating in-silico genotypes. The inputs into the second imputation step are the fully phased (in particular every allele has been assigned a parent of origin) imputed and chip type genotypes of the available chip typed individuals. The algorithm used to perform the second imputation step consists of: 1. For each ungenotyped individual (the proband), find all chip genotyped individuals within two meiosis of the individual. The six possible types of two meiosis relatives of the proband are (ignoring more complicated relationships due to pedigree loops): Parents, full and half siblings, grandparents, children and grandchildren. If all pedigree paths from the proband to a genotyped relative go through other genotyped relatives, then that relative is excluded. E.g. if a parent of the proband is genotyped, then the proband s grandparents through that parent are excluded. If the number of meiosis in the pedigree around the proband exceeds a threshold (we used 12), then relatives are removed from the pedigree until the number of meiosis falls below 12, in order to reduce computational complexity. 20

21 2. At every point in the genome, calculate the probability for each genotyped relative sharing with the proband based on the autosomal SNPs used for phasing. A multipoint algorithm based on the hidden Markov model Lander-Green multipoint linkage algorithm using fast Fourier transforms is used to calculate these sharing probabilities 34,35. First single point sharing probabilities are calculated by dividing the genome into 0.5cM bins and using the haplotypes over these bins as alleles. Haplotypes that are the same, except at most at a single SNP, are treated as identical. When the haplotypes in the pedigree are incompatible over a bin, then a uniform probability distribution was used for that bin. The most common causes for such incompatibilities are recombinations in member belonging to the pedigree, phasing errors and genotyping errors. Note that since the input genotypes are fully phased, the single point information is substantially more informative than for unphased genotyped, in particular one haplotype of the parent of a genotyped child is always known. The single point distributions are then convolved using the multipoint algorithm to obtain multipoint sharing probabilities at the center of each bin. Genetic distances were obtained from the most recent version of the decode genetic map Based on the sharing probabilities at the center of each bin, all the SNPs from the whole genome sequencing are imputed into the proband. To impute the genotype of the paternal allele of a SNP located at, flanked by bins with centers at and. Starting with the left bin, going through all possible sharing patterns, let be the set of haplotypes of genotyped individuals that share identically by descent within the pedigree with the proband s paternal haplotype given the sharing pattern and be the probability of at the left bin this is the output from step 2 above and let be the expected allele count of the SNP for haplotype. Then is the expected allele count of the 21

22 paternal haplotype of the proband given and an overall estimate of the allele count given the sharing distribution at the left bin is obtained from. If is empty then no relative shares with the proband s paternal haplotype given and thus there is no information about the allele count. We therefore store the probability that some genotyped relative shared the proband s paternal haplotype,, and an expected allele count, conditional on the proband s paternal haplotype being shared by at least one genotyped relative:,,. In the same way calculate and. Linear interpolation is then used to get an estimates at the SNP from the two flanking bins:,. If is an estimate of the population frequency of the SNP then 1 is an estimate of the allele count for the proband s paternal haplotype. Similarly, an expected allele count can be obtained for the proband s maternal haplotype. Genotype imputation information. The informativeness of genotype imputation was estimated by the ratio of the variance of imputed expected allele counts and the variance of the actual allele counts:, 22

23 where 0,1 is the allele count. was estimated by the observed variance of the imputed expected counts and was estimated by 1, where is the allele frequency. For the present study, when imputed genotypes are used, the information value for all SNPs is between 0.92 and Case control association testing. Logistic regression was used to test for association between SNPs and disease, treating disease status as the response and expected genotype counts from imputation or allele counts from direct genotyping as covariates. Testing was performed using the likelihood ratio statistic. When testing for association based on the in silico genotypes, controls were matched to cases based on the informativeness of the imputed genotypes, such that for each case controls of matching informativeness where chosen. Failing to match cases and controls will lead to a highly inflated genomic control factor, and in some cases may lead to spurious false positive findings. The informativeness of each of the imputation of each one of an individual s haplotypes was estimated by taking the average of, 1,, over all SNPs imputed for the individual, where is the expected allele count for the haplotype at the SNP and is the population frequency of the SNP. Note that, 0 and 0, 1, 1. The mean informativeness values cluster into groups corresponding to the most common pedigree configurations used in the imputation, such as imputing from parent into child or from child into parent. Based on this clustering of imputation informativeness we divided the haplotypes of individuals into seven groups of varying informativeness, which created 27 groups of individuals of similar imputation informativeness; 7 groups of individuals with both haplotypes having similar informativeness, 21 groups of individuals with the two haplotypes 23

24 having different informativeness, minus the one group of individuals with neither haplotype being imputed well. Within each group we calculate the ratio of the number of controls and the number of cases, and choose the largest integer that was less than this ratio in all the groups. For example, if in one group there are 10.3 times as many controls as cases and if in all other groups this ratio was greater, then we would set 10 and within each group randomly select ten times as many controls as there are cases. For prostate cancer we used

25 References 1. Al Olama, A.A. et al. Multiple loci on 8q24 associated with prostate cancer susceptibility. Nat Genet 41, (2009). 2. Zheng, S.L. et al. Association between two unlinked loci at 8q24 and prostate cancer risk among European Americans. J Natl Cancer Inst 99, (2007). 3. Gudmundsson, J. et al. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nat Genet 39, (2007). 4. Crowther-Swanepoel, D. et al. Common variants at 2q37.3, 8q24.21, 15q21.3 and 16q24.1 influence chronic lymphocytic leukemia risk. Nat Genet 42, (2010). 5. Gudmundsson, J. et al. Genome-wide association and replication studies identify four variants associated with prostate cancer susceptibility. Nat Genet 41, (2009). 6. Yeager, M. et al. Identification of a new prostate cancer susceptibility locus on chromosome 8q24. Nat Genet 41, (2009). 7. Easton, D.F. et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, (2007). 8. Yeager, M. et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 39, (2007). 9. Haiman, C.A. et al. A common genetic risk factor for colorectal and prostate cancer. Nat Genet 39, (2007). 10. Zanke, B.W. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet 39, (2007). 11. Tomlinson, I. et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q Nat Genet 39, (2007). 12. Amundadottir, L.T. et al. A common variant associated with prostate cancer in European and African populations. Nat Genet 38, (2006). 13. Kiemeney, L.A. et al. Sequence variant on 8q24 confers susceptibility to urinary bladder cancer. Nat Genet 40, (2008). 14. Gulcher, J.R., Kristjansson, K., Gudbjartsson, H. & Stefansson, K. Protection of privacy by third-party encryption in genetic research in Iceland. Eur J Hum Genet 8, (2000). 15. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, (2009). 16. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, (2010). 17. DePristo, M.A. et al. A framework for variation discovery and genotyping using nextgeneration DNA sequencing data. Nat Genet 43, (2011). 18. Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet 40, (2008). 19. Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, (2010). 20. Sulem, P. et al. Identification of low-frequency variants associated with gout and serum uric acid levels. Nat Genet 43, (2011). 21. Rafnar, T. et al. Mutations in BRIP1 confer high risk of ovarian cancer. Nat Genet 43, (2011). 25

26 22. Stacey, S.N. et al. A germline variant in the TP53 polyadenylation signal confers cancer susceptibility. Nat Genet 43, (2011). 23. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39, (2007). 26

Discovery of common variants associated with low TSH levels and thyroid cancer risk

Discovery of common variants associated with low TSH levels and thyroid cancer risk Supplementary Information Discovery of common variants associated with low TSH levels and thyroid cancer risk Julius Gudmundsson 1*, Patrick Sulem 1*, Daniel F. Gudbjartsson 1, Jon G. Jonasson 2,3,4, Gisli

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature11396 A total of 2078 samples from a large sequencing project at decode were used in this study, 219 samples from 78 trios with two grandchildren who were not also members of other trios,

More information

Additional Disclosure

Additional Disclosure Additional Disclosure The Genetics of Prostate Cancer: Clinical Implications William J. Catalona, MD Collaborator with decode genetics, Inc. Non-paid consultant with no financial interest or support Northwestern

More information

Supplementary Figure 1. (a) A Q-Q plot of all the 310,520 uncorrected (red crosses) and corrected (blue star) chi-square statistics from the

Supplementary Figure 1. (a) A Q-Q plot of all the 310,520 uncorrected (red crosses) and corrected (blue star) chi-square statistics from the a b Supplementary Figure 1. (a) A Q-Q plot of all the 310,520 uncorrected (red crosses) and corrected (blue star) chi-square statistics from the single-point association. (b) Same as a except results for

More information

Supplementary webappendix

Supplementary webappendix Supplementary webappendix This webappendix formed part of the original submission and has been peer reviewed. We post it as supplied by the authors. Supplement to: Hartman M, Loy EY, Ku CS, Chia KS. Molecular

More information

Mutations in BRIP1/FANCJ confer high risk of ovarian cancer

Mutations in BRIP1/FANCJ confer high risk of ovarian cancer Mutations in BRIP1/FANCJ confer high risk of ovarian cancer Thorunn Rafnar 1 *, Daniel F Gudbjartsson 1 *, Patrick Sulem 1 *, Aslaug Jonasdottir 1, Asgeir Sigurdsson 1, Adalbjorg Jonasdottir 1, Soren Besenbacher

More information

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant

More information

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations. Supplementary Figure. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations. a Eigenvector 2.5..5.5. African Americans European Americans e

More information

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK CHAPTER 6 DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK Genetic research aimed at the identification of new breast cancer susceptibility genes is at an interesting crossroad. On the one hand, the existence

More information

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017 Large-scale identity-by-descent mapping discovers rare haplotypes of large effect Suyash Shringarpure 23andMe, Inc. ASHG 2017 1 Why care about rare variants of large effect? Months from randomization 2

More information

TITLE: Genetic Association Study of Ancestry-Matched African American Prostate Cancer Cases and Controls

TITLE: Genetic Association Study of Ancestry-Matched African American Prostate Cancer Cases and Controls AD Award Number: W81XWH-07-1-0122 TITLE: Genetic Association Study of Ancestry-Matched African American Prostate Cancer Cases and Controls PRINCIPAL INVESTIGATOR: William B. Isaacs, Ph.D. CONTRACTING ORGANIZATION:

More information

New Enhancements: GWAS Workflows with SVS

New Enhancements: GWAS Workflows with SVS New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences

More information

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection Dr Elaine Kenny Neuropsychiatric Genetics Research Group Institute of Molecular Medicine Trinity College Dublin

More information

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS dr sc. Ana Krivokuća Laboratory for molecular genetics Institute for Oncology and

More information

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,

More information

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Marilou Wijdicks International Product Manager Research For Life Science Research Only. Not for Use in Diagnostic Procedures.

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Illustrative example of ptdt using height The expected value of a child s polygenic risk score (PRS) for a trait is the average of maternal and paternal PRS values. For example,

More information

A common genetic variant of 5p15.33 is associated with risk for prostate cancer in the Chinese population

A common genetic variant of 5p15.33 is associated with risk for prostate cancer in the Chinese population A common genetic variant of 5p15.33 is associated with risk for prostate cancer in the Chinese population Q. Ren 1,3 *, B. Xu 2,3 *, S.Q. Chen 2,3 *, Y. Yang 2,3, C.Y. Wang 2,3, Y.D. Wang 2,3, X.H. Wang

More information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies

More information

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis HMG Advance Access published December 21, 2012 Human Molecular Genetics, 2012 1 13 doi:10.1093/hmg/dds512 Whole-genome detection of disease-associated deletions or excess homozygosity in a case control

More information

MRC-Holland MLPA. Description version 18; 09 September 2015

MRC-Holland MLPA. Description version 18; 09 September 2015 SALSA MLPA probemix P090-A4 BRCA2 Lot A4-0715, A4-0714, A4-0314, A4-0813, A4-0712: Compared to lot A3-0710, the 88 and 96 nt control fragments have been replaced (QDX2). This product is identical to the

More information

Identification of low frequency and rare sequence variants associated with elevated or reduced risk of type 2 diabetes. Supplementary information

Identification of low frequency and rare sequence variants associated with elevated or reduced risk of type 2 diabetes. Supplementary information Identification of low frequency and rare sequence variants associated with elevated or reduced risk of type 2 diabetes Supplementary information Valgerdur Steinthorsdottir 1, Gudmar Thorleifsson 1, Patrick

More information

A rare variant in MYH6 confers high risk of sick sinus syndrome. Hilma Hólm ESC Congress 2011 Paris, France

A rare variant in MYH6 confers high risk of sick sinus syndrome. Hilma Hólm ESC Congress 2011 Paris, France A rare variant in MYH6 confers high risk of sick sinus syndrome Hilma Hólm ESC Congress 2011 Paris, France Disclosures I am an employee of decode genetics, Reykjavik, Iceland. Sick sinus syndrome SSS is

More information

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin,

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin, ESM Methods Hyperinsulinemic-euglycemic clamp procedure During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin, Clayton, NC) was followed by a constant rate (60 mu m

More information

Introduction to LOH and Allele Specific Copy Number User Forum

Introduction to LOH and Allele Specific Copy Number User Forum Introduction to LOH and Allele Specific Copy Number User Forum Jonathan Gerstenhaber Introduction to LOH and ASCN User Forum Contents 1. Loss of heterozygosity Analysis procedure Types of baselines 2.

More information

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA www.impactjournals.com/oncotarget/ Oncotarget, Supplementary Materials 2016 Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) DNA Supplementary Materials

More information

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer risk in stage 1 (red) and after removing any SNPs within

More information

The laws of Heredity. Allele: is the copy (or a version) of the gene that control the same characteristics.

The laws of Heredity. Allele: is the copy (or a version) of the gene that control the same characteristics. The laws of Heredity 1. Definition: Heredity: The passing of traits from parents to their offspring by means of the genes from the parents. Gene: Part or portion of a chromosome that carries genetic information

More information

Below, we included the point-to-point response to the comments of both reviewers.

Below, we included the point-to-point response to the comments of both reviewers. To the Editor and Reviewers: We would like to thank the editor and reviewers for careful reading, and constructive suggestions for our manuscript. According to comments from both reviewers, we have comprehensively

More information

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer Cancer Gene Panels Dr. Andreas Scherer Dr. Andreas Scherer President and CEO Golden Helix, Inc. scherer@goldenhelix.com Twitter: andreasscherer About Golden Helix - Founded in 1998 - Main outside investor:

More information

Analysis with SureCall 2.1

Analysis with SureCall 2.1 Analysis with SureCall 2.1 Danielle Fletcher Field Application Scientist July 2014 1 Stages of NGS Analysis Primary analysis, base calling Control Software FASTQ file reads + quality 2 Stages of NGS Analysis

More information

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder Introduction to linkage and family based designs to study the genetic epidemiology of complex traits Harold Snieder Overview of presentation Designs: population vs. family based Mendelian vs. complex diseases/traits

More information

Genomic structural variation

Genomic structural variation Genomic structural variation Mario Cáceres The new genomic variation DNA sequence differs across individuals much more than researchers had suspected through structural changes A huge amount of structural

More information

National Disease Research Interchange Annual Progress Report: 2010 Formula Grant

National Disease Research Interchange Annual Progress Report: 2010 Formula Grant National Disease Research Interchange Annual Progress Report: 2010 Formula Grant Reporting Period July 1, 2012 June 30, 2013 Formula Grant Overview The National Disease Research Interchange received $62,393

More information

# For the GWAS stage, B-cell NHL cases which small numbers (N<20) were excluded from analysis.

# For the GWAS stage, B-cell NHL cases which small numbers (N<20) were excluded from analysis. Supplementary Table 1a. Subtype Breakdown of all analyzed samples Stage GWAS Singapore Validation 1 Guangzhou Validation 2 Guangzhou Validation 3 Beijing Total No. of B-Cell Cases 253 # 168^ 294^ 713^

More information

Summary. Introduction. Atypical and Duplicated Samples. Atypical Samples. Noah A. Rosenberg

Summary. Introduction. Atypical and Duplicated Samples. Atypical Samples. Noah A. Rosenberg doi: 10.1111/j.1469-1809.2006.00285.x Standardized Subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, Accounting for Atypical and Duplicated Samples and Pairs of Close Relatives Noah A. Rosenberg

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive

More information

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit APPLICATION NOTE Ion PGM System Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit Key findings The Ion PGM System, in concert with the Ion ReproSeq PGS View Kit and Ion Reporter

More information

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits?

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits? corebio II - genetics: WED 25 April 2018. 2018 Stephanie Moon, Ph.D. - GWAS After this class students should be able to: 1. Compare and contrast methods used to discover the genetic basis of traits or

More information

Colorspace & Matching

Colorspace & Matching Colorspace & Matching Outline Color space and 2-base-encoding Quality Values and filtering Mapping algorithm and considerations Estimate accuracy Coverage 2 2008 Applied Biosystems Color Space Properties

More information

Conditions. Name : dummy Age/sex : xx Y /x. Lab No : xxxxxxxxx. Rep Centre : xxxxxxxxxxx Ref by : Dr. xxxxxxxxxx

Conditions. Name : dummy Age/sex : xx Y /x. Lab No : xxxxxxxxx. Rep Centre : xxxxxxxxxxx Ref by : Dr. xxxxxxxxxx Name : dummy Age/sex : xx Y /x Lab No : xxxxxxxxx Rep Centre : xxxxxxxxxxx Ref by : Dr. xxxxxxxxxx Rec. Date : xx/xx/xx Rep Date : xx/xx/xx GENETIC MAPPING FOR ONCOLOGY Conditions Melanoma Prostate Cancer

More information

(b) What is the allele frequency of the b allele in the new merged population on the island?

(b) What is the allele frequency of the b allele in the new merged population on the island? 2005 7.03 Problem Set 6 KEY Due before 5 PM on WEDNESDAY, November 23, 2005. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. Two populations (Population One

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc. Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Topics Overview of Data Processing Pipeline Overview of Data Files 2 DNA Nano-Ball (DNB) Read Structure Genome : acgtacatgcattcacacatgcttagctatctctcgccag

More information

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used

More information

Supplementary Figure S1A

Supplementary Figure S1A Supplementary Figure S1A-G. LocusZoom regional association plots for the seven new cross-cancer loci that were > 1 Mb from known index SNPs. Genes up to 500 kb on either side of each new index SNP are

More information

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Stanford Biostatistics Workshop Pierre Neuvial with Henrik Bengtsson and Terry Speed Department of Statistics, UC Berkeley

More information

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed. Reviewers' Comments: Reviewer #1 (Remarks to the Author) The manuscript titled 'Association of variations in HLA-class II and other loci with susceptibility to lung adenocarcinoma with EGFR mutation' evaluated

More information

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery PacBio CoLab Session October 20, 2017 For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences

More information

CS2220 Introduction to Computational Biology

CS2220 Introduction to Computational Biology CS2220 Introduction to Computational Biology WEEK 8: GENOME-WIDE ASSOCIATION STUDIES (GWAS) 1 Dr. Mengling FENG Institute for Infocomm Research Massachusetts Institute of Technology mfeng@mit.edu PLANS

More information

Supplementary Methods

Supplementary Methods Supplementary Methods Short Read Preprocessing Reads are preprocessed differently according to how they will be used: detection of the variant in the tumor, discovery of an artifact in the normal or for

More information

Effects of age-at-diagnosis and duration of diabetes on GADA and IA-2A positivity

Effects of age-at-diagnosis and duration of diabetes on GADA and IA-2A positivity Effects of age-at-diagnosis and duration of diabetes on GADA and IA-2A positivity Duration of diabetes was inversely correlated with age-at-diagnosis (ρ=-0.13). However, as backward stepwise regression

More information

Genetics and Genomics in Medicine Chapter 8 Questions

Genetics and Genomics in Medicine Chapter 8 Questions Genetics and Genomics in Medicine Chapter 8 Questions Linkage Analysis Question Question 8.1 Affected members of the pedigree above have an autosomal dominant disorder, and cytogenetic analyses using conventional

More information

Fellow GU Lecture Series, Prostate Cancer. Asit Paul, MD, PhD 02/20/2018

Fellow GU Lecture Series, Prostate Cancer. Asit Paul, MD, PhD 02/20/2018 Fellow GU Lecture Series, 2018 Prostate Cancer Asit Paul, MD, PhD 02/20/2018 Disease Burden Screening Risk assessment Treatment Global Burden of Prostate Cancer Prostate cancer ranked 13 th among cancer

More information

Chapter 2. Linkage Analysis. JenniferH.BarrettandM.DawnTeare. Abstract. 1. Introduction

Chapter 2. Linkage Analysis. JenniferH.BarrettandM.DawnTeare. Abstract. 1. Introduction Chapter 2 Linkage Analysis JenniferH.BarrettandM.DawnTeare Abstract Linkage analysis is used to map genetic loci using observations on relatives. It can be applied to both major gene disorders (parametric

More information

Genome - Wide Linkage Mapping

Genome - Wide Linkage Mapping Biological Sciences Initiative HHMI Genome - Wide Linkage Mapping Introduction This activity is based on the work of Dr. Christine Seidman et al that was published in Circulation, 1998, vol 97, pgs 2043-2048.

More information

Medical Policy Manual. Topic: Genetic Testing for Hereditary Breast and/or Ovarian Cancer. Date of Origin: January 27, 2011

Medical Policy Manual. Topic: Genetic Testing for Hereditary Breast and/or Ovarian Cancer. Date of Origin: January 27, 2011 Medical Policy Manual Topic: Genetic Testing for Hereditary Breast and/or Ovarian Cancer Date of Origin: January 27, 2011 Section: Genetic Testing Last Reviewed Date: July 2014 Policy No: 02 Effective

More information

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22. Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.32 PCOS locus after conditioning for the lead SNP rs10993397;

More information

LTA Analysis of HapMap Genotype Data

LTA Analysis of HapMap Genotype Data LTA Analysis of HapMap Genotype Data Introduction. This supplement to Global variation in copy number in the human genome, by Redon et al., describes the details of the LTA analysis used to screen HapMap

More information

Statistical Genetics : Gene Mappin g through Linkag e and Associatio n

Statistical Genetics : Gene Mappin g through Linkag e and Associatio n Statistical Genetics : Gene Mappin g through Linkag e and Associatio n Benjamin M Neale Manuel AR Ferreira Sarah E Medlan d Danielle Posthuma About the editors List of contributors Preface Acknowledgements

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

GWAS of HCC Proposed Statistical Approach Mendelian Randomization and Mediation Analysis. Chris Amos Manal Hassan Lewis Roberts Donghui Li

GWAS of HCC Proposed Statistical Approach Mendelian Randomization and Mediation Analysis. Chris Amos Manal Hassan Lewis Roberts Donghui Li GWAS of HCC Proposed Statistical Approach Mendelian Randomization and Mediation Analysis Chris Amos Manal Hassan Lewis Roberts Donghui Li Overall Design of GWAS Study Aim 1 (DISCOVERY PHASE): To genotype

More information

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis APPLICATION NOTE Cell-Free DNA Isolation Kit A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis Abstract Circulating cell-free DNA (cfdna) has been shown

More information

Policy Specific Section: Medical Necessity and Investigational / Experimental. October 15, 1997 October 9, 2013

Policy Specific Section: Medical Necessity and Investigational / Experimental. October 15, 1997 October 9, 2013 Medical Policy Genetic Testing for Hereditary Breast and/or Ovarian Cancer Type: Medical Necessity and Investigational / Experimental Policy Specific Section: Laboratory/Pathology Original Policy Date:

More information

Corporate Medical Policy Genetic Testing for Breast and Ovarian Cancer

Corporate Medical Policy Genetic Testing for Breast and Ovarian Cancer Corporate Medical Policy Genetic Testing for Breast and Ovarian Cancer File Name: Origination: Last CAP Review: Next CAP Review: Last Review: genetic_testing_for_breast_and_ovarian_cancer 8/1997 8/2017

More information

Global variation in copy number in the human genome

Global variation in copy number in the human genome Global variation in copy number in the human genome Redon et. al. Nature 444:444-454 (2006) 12.03.2007 Tarmo Puurand Study 270 individuals (HapMap collection) Affymetrix 500K Whole Genome TilePath (WGTP)

More information

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012 Statistical Tests for X Chromosome Association Study with Simulations Jian Wang July 10, 2012 Statistical Tests Zheng G, et al. 2007. Testing association for markers on the X chromosome. Genetic Epidemiology

More information

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS

Fluxion Biosciences and Swift Biosciences Somatic variant detection from liquid biopsy samples using targeted NGS APPLICATION NOTE Fluxion Biosciences and Swift Biosciences OVERVIEW This application note describes a robust method for detecting somatic mutations from liquid biopsy samples by combining circulating tumor

More information

FAMILIAL AGGREGATION OF PARKINSON S DISEASE IN ICELAND FAMILIAL AGGREGATION OF PARKINSON S DISEASE IN ICELAND. Patients

FAMILIAL AGGREGATION OF PARKINSON S DISEASE IN ICELAND FAMILIAL AGGREGATION OF PARKINSON S DISEASE IN ICELAND. Patients FAMILIAL AGGREGATION OF PARKINSON S DISEASE IN ICELAND FAMILIAL AGGREGATION OF PARKINSON S DISEASE IN ICELAND SIGURLAUG SVEINBJÖRNSDÓTTIR, M.D., ANDREW A. HICKS, PH.D., THORLÁKUR JÓNSSON, PH.D., HJÖRVAR

More information

Mendel Short IGES 2003 Data Preparation. Eric Sobel. Department of of Human Genetics UCLA School of of Medicine

Mendel Short IGES 2003 Data Preparation. Eric Sobel. Department of of Human Genetics UCLA School of of Medicine Mendel Short Course @ IGES 2003 Data Preparation Eric Sobel Department of of Human Genetics UCLA School of of Medicine 02 November 2003 Mendel Short Course @ IGES Slide 1 Web Sites Mendel5: www.genetics.ucla.edu/software

More information

SALSA MLPA KIT P050-B2 CAH

SALSA MLPA KIT P050-B2 CAH SALSA MLPA KIT P050-B2 CAH Lot 0510, 0909, 0408: Compared to lot 0107, extra control fragments have been added at 88, 96, 100 and 105 nt. The 274 nt probe gives a higher signal in lot 0510 compared to

More information

Performance Characteristics BRCA MASTR Plus Dx

Performance Characteristics BRCA MASTR Plus Dx Performance Characteristics BRCA MASTR Plus Dx with drmid Dx for Illumina NGS systems Manufacturer Multiplicom N.V. Galileïlaan 18 2845 Niel Belgium Table of Contents 1. Workflow... 4 2. Performance Characteristics

More information

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative Association mapping (qualitative) Office hours Wednesday 3-4pm 304A Stanley Hall Fig. 11.26 Association scan, qualitative Association scan, quantitative osteoarthritis controls χ 2 test C s G s 141 47

More information

Supplementary Online Content

Supplementary Online Content Supplementary Online Content Fumagalli D, Venet D, Ignatiadis M, et al. RNA Sequencing to predict response to neoadjuvant anti-her2 therapy: a secondary analysis of the NeoALTTO randomized clinical trial.

More information

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG)

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG) Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG) Ordering Information Acceptable specimen types: Fresh blood sample (3-6 ml EDTA; no time limitations associated with receipt)

More information

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits Accelerating clinical research Next-generation sequencing (NGS) has the ability to interrogate many different genes and detect

More information

DNA-seq Bioinformatics Analysis: Copy Number Variation

DNA-seq Bioinformatics Analysis: Copy Number Variation DNA-seq Bioinformatics Analysis: Copy Number Variation Elodie Girard elodie.girard@curie.fr U900 institut Curie, INSERM, Mines ParisTech, PSL Research University Paris, France NGS Applications 5C HiC DNA-seq

More information

On Missing Data and Genotyping Errors in Association Studies

On Missing Data and Genotyping Errors in Association Studies On Missing Data and Genotyping Errors in Association Studies Department of Biostatistics Johns Hopkins Bloomberg School of Public Health May 16, 2008 Specific Aims of our R01 1 Develop and evaluate new

More information

Supplementary Material to. Genome-wide association study identifies new HLA Class II haplotypes strongly protective against narcolepsy

Supplementary Material to. Genome-wide association study identifies new HLA Class II haplotypes strongly protective against narcolepsy Supplementary Material to Genome-wide association study identifies new HLA Class II haplotypes strongly protective against narcolepsy Hyun Hor, 1,2, Zoltán Kutalik, 3,4, Yves Dauvilliers, 2,5 Armand Valsesia,

More information

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep

Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep Using the Bravo Liquid-Handling System for Next Generation Sequencing Sample Prep Tom Walsh, PhD Division of Medical Genetics University of Washington Next generation sequencing Sanger sequencing gold

More information

Supplementary Methods. 1. Cancer Genetic Markers of Susceptibility (CGEMS) Prostate Cancer Genome-Wide Association Scan

Supplementary Methods. 1. Cancer Genetic Markers of Susceptibility (CGEMS) Prostate Cancer Genome-Wide Association Scan Supplementary Methods 1. Cancer Genetic Markers of Susceptibility (CGEMS) Prostate Cancer Genome-Wide Association Scan The CGEMS data portal provides public access to summary results for approximately

More information

SUPPLEMENTARY FIGURES

SUPPLEMENTARY FIGURES SUPPLEMENTARY FIGURES Supplementary Figure 1 Regional association plots for genome-wide significant PCOS signals. Dots represents individual SNP association P-values (on the log10 scale) in the 23andMe

More information

Dan Koller, Ph.D. Medical and Molecular Genetics

Dan Koller, Ph.D. Medical and Molecular Genetics Design of Genetic Studies Dan Koller, Ph.D. Research Assistant Professor Medical and Molecular Genetics Genetics and Medicine Over the past decade, advances from genetics have permeated medicine Identification

More information

GENOME-WIDE ASSOCIATION STUDIES

GENOME-WIDE ASSOCIATION STUDIES GENOME-WIDE ASSOCIATION STUDIES SUCCESSES AND PITFALLS IBT 2012 Human Genetics & Molecular Medicine Zané Lombard IDENTIFYING DISEASE GENES??? Nature, 15 Feb 2001 Science, 16 Feb 2001 IDENTIFYING DISEASE

More information

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space Whole genome sequencing Whole exome sequencing BWA alignment to reference transcriptome and genome Convert transcriptome mappings back to genome space genomes Filter on MQ, distance, Cigar string Annotate

More information

Novel sequence variants associated with bone mineral density

Novel sequence variants associated with bone mineral density Supplementary Material: Novel sequence variants associated with bone mineral density Unnur Styrkarsdottir 1*, Bjarni V. Halldorsson 1,2, Solveig Gretarsdottir 1, Daniel F. Gudbjartsson 1, G. Bragi Walters

More information

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M.

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M. SUPPLEMENTARY DATA Gene expression profiling predicts clinical outcome of prostate cancer Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M. Hoffman, William L. Gerald Table of Contents

More information

Non-parametric methods for linkage analysis

Non-parametric methods for linkage analysis BIOSTT516 Statistical Methods in Genetic Epidemiology utumn 005 Non-parametric methods for linkage analysis To this point, we have discussed model-based linkage analyses. These require one to specify a

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig 1. Comparison of sub-samples on the first two principal components of genetic variation. TheBritishsampleisplottedwithredpoints.The sub-samples of the diverse sample

More information

Human population sub-structure and genetic association studies

Human population sub-structure and genetic association studies Human population sub-structure and genetic association studies Stephanie A. Santorico, Ph.D. Department of Mathematical & Statistical Sciences Stephanie.Santorico@ucdenver.edu Global Similarity Map from

More information

Hereditary Prostate Cancer: From Gene Discovery to Clinical Implementation

Hereditary Prostate Cancer: From Gene Discovery to Clinical Implementation Hereditary Prostate Cancer: From Gene Discovery to Clinical Implementation Kathleen A. Cooney, MD MACP Duke University School of Medicine Duke Cancer Institute (No disclosures to report) Overview Prostate

More information

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Here we compare the results of this study to potentially overlapping results from four earlier studies

More information

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland AD Award Number: W81XWH-06-1-0181 TITLE: Analysis of Ethnic Admixture in Prostate Cancer PRINCIPAL INVESTIGATOR: Cathryn Bock, Ph.D. CONTRACTING ORGANIZATION: Wayne State University Detroit, MI 48202 REPORT

More information

Nature Biotechnology: doi: /nbt.1904

Nature Biotechnology: doi: /nbt.1904 Supplementary Information Comparison between assembly-based SV calls and array CGH results Genome-wide array assessment of copy number changes, such as array comparative genomic hybridization (acgh), is

More information

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits Next-generation performance in liquid biopsies 2 Accelerating clinical research From liquid biopsy to next-generation

More information

Utility of Incorporating Genetic Variants for the Early Detection of Prostate Cancer

Utility of Incorporating Genetic Variants for the Early Detection of Prostate Cancer Utility of Incorporating Genetic Variants for the Early Detection of Prostate Cancer Robert K. Nam, 1,9 William W. Zhang, 1 John Trachtenberg, 6 Arun Seth, 2 Laurence H. Klotz, 1 Aleksandra Stanimirovic,

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

The BARD1 Cys557Ser Variant and Breast Cancer Risk in Iceland

The BARD1 Cys557Ser Variant and Breast Cancer Risk in Iceland The BARD1 Cys557Ser Variant and Breast Cancer Risk in Iceland PLoS MEDICINE Simon N. Stacey 1*, Patrick Sulem 1, Oskar T. Johannsson 2, Agnar Helgason 1, Julius Gudmundsson 1, Jelena P. Kostic 1, Kristleifur

More information

MODULE NO.14: Y-Chromosome Testing

MODULE NO.14: Y-Chromosome Testing SUBJECT Paper No. and Title Module No. and Title Module Tag FORENSIC SIENCE PAPER No.13: DNA Forensics MODULE No.21: Y-Chromosome Testing FSC_P13_M21 TABLE OF CONTENTS 1. Learning Outcome 2. Introduction:

More information

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser Characterisation of structural variation in breast cancer genomes using paired-end sequencing on the Illumina Genome Analyser Phil Stephens Cancer Genome Project Why is it important to study cancer? Why

More information