Detection and Validation of Clinically Relevant High Order Epistatic Interactions in a BRCA2 Positive Breast Cancer Population

Size: px
Start display at page:

Download "Detection and Validation of Clinically Relevant High Order Epistatic Interactions in a BRCA2 Positive Breast Cancer Population"

Transcription

1 MARKERS AI enabled precision medicine precisonlife1 Detection and Validation of Clinically Relevant High Order Epistatic Interactions in a BRCA2 Positive Breast Cancer Population Gert Lykke Møller, Erling Mellerup, Claus Erik Jensen, Dorota Matelska, Steve Gardner Introduction Genome Wide Association Studies (GWAS) aim to find (single) genetic variant loci associated with specific phenotypes. While GWAS has been useful at identifying disease associated factors, it is known to provide only a limited model for explaining complex diseases as very few loci have significant effect sizes and most diseases are highly polygenic (Boyle, 2017). In addition, GWAS cannot directly include the impact of non-genomic factors such as phenotype, lifestyle and comorbidities that may modulate disease processes and exert significant influence over disease risks. Current detection methods for disease associated combinations of SNPs (epistatic interactions) are able only to find combinations of two or at most three SNPs from a preselected list. This significantly limits the insights that can be gained from the analysis. precisionlife MARKERS radically extends this capability. It is a highly innovative and massively scalable combinatorial multiomics association platform that can detect and annotate high order epistatic interactions at genome wide and disease population scale. precisionlife MARKERS can find and statistically validate specific combinations of up to 20 SNP genotypes (and/or other non-genomic features) that are found in many cases and zero/few controls and associate those combinations with specific disease phenotypes. precisionlife MARKERS overcomes three major limitations of existing large-scale analysis methods such as GWAS: finding combinations of multiple features that in a specific combination are found in patients (cases) but not controls, and associating them with an observed outcome (e.g. disease risk, protective effect or therapy response), identifying & validating higher order interactions (e.g. with 20+ features) in tractable time on affordable hardware, including different types of features genomic, phenotypic, clinical, lifestyle and other factors in the associations This opens up new opportunities to generate value from existing genomic and related patient population datasets for: NOVEL TARGET DISCOVERY AND ADAPTIVE TRIALS DESIGN DRUG REPURPOSING DISEASE PROTECTIVE EFFECTS IMPROVEMENT OF EXISTING POLYGENIC RISK SCORES DESIGN OF PERSONALIZED COMBINATORIAL THERAPY REGIMENS These are illustrated below by an analysis of the CIMBA consortium s BRCA2 positive population in the context of breast cancer disease risk (Chenevix-Trench, et al., 2007) run on a single IBM Minsky Power8 NVLink system with 4 x Nvidia GP100 GPUs: 7978 BRCA2 positive cases + controls 1000 fully random permutations 200K SNPs / person 10⁶⁰ possible n-snp combinations 3 4 day run on GPUs distinct patient cohorts SNPs in combination 3 5 drug repurposing opportunities in single network disease protective effect SNPs USA: One Broadway, Cambridge, MA T: +1 (617) precisionlife is a registered UK: C9 Glyme Court, Langford Lane, Kidlington OX5 1LQ T: trademark of RowAnalytics Ltd

2 Background Over the last 15 years applying gene sequence analysis and GWAS we have learned that diseases, especially common chronic diseases, are much more complex than we originally thought (Low SK, 2018). Even the most important disease loci have small effect sizes, and tens or hundreds of variants and other external (e.g. phenotypic/environmental) factors may contribute to disease risks/protective effects (Visscher, et al., 2017). Diseases which were previously single diagnoses can now be stratified into multiple patient subgroups even using just a few combined factors (Ahlqvist E, 2018). We now understand the more complex and nuanced interconnectedness of gene regulatory networks (Boyle, 2017) and genetic control regions (ENCODE Project Consortium, 2012). Complex diseases are often caused by non-coding variants, which do not affect protein structure, but may affect gene expression (Li Y, 2016). It appears that the impact of such variants is stronger when they occur in active chromatin and in expression quantitative trait loci (eqtls), particularly in chromatin that is active in cell types relevant to the disease (Trynka, 2013). It has therefore been hypothesised that complex disease is driven by an accumulation of weak effects on the key genes and regulatory pathways that drive disease related processes (Chakravarti, 2016). We believe that this interpretation is correct but incomplete, primarily because GWAS analyses have been ineffective at finding high-order SNP-combinations that synergistically affect disease status (Wan, 2010). We therefore built precisionlife MARKERS to enable analysis of the combined effects of multiple variants affecting several genes and regulatory pathways. Finding High-Order Epistatic Interactions Finding combinations of SNP genotypes (and other features) that are highly associated with a specific phenotype in a large study is computationally challenging. In a study with three types of SNP genotype (normal homozygote, heterozygote and variant homozygote) the number of possible combinations is n! 3 r /r!(n r)! where n = the total number of SNPs and r = the maximum combinatorial order. The number of possible combinations increases exponentially as the order goes higher. In a study analyzing 500,000 SNPs, the theoretical number of combinations of ten SNP genotypes is 1 10⁵⁷. For a medium sized study such as the full CIMBA dataset (15,000 patients, 200,000 SNPs per person), each additional step in combinatorial order increases the number of combinations to be tested by a factor of over 10⁵. This becomes even more problematic if a large number of permutations are used to correct for multiple sampling and establish statistical significance. The challenges of combinatorial expansion and feature heterogeneity have led to higher order SNP and multi-modal feature combinations being computationally intractable for GWAS datasets, and hence such networks have not been observed and nor have their associations to phenotype been described. It has not been possible to identify and validate more causal variants that are only involved in disease processes in the context of multiple other specific features. With precisionlife MARKERS it is now possible routinely (on affordable GPU servers) to identify multiple genetically non- or minimally-overlapping cohorts within a single disease patient population that share high order disease associated combinatorial features (Mellerup, 2017). Methodology precisionlife MARKERS is a massively scalable combinatorial multi-omics association platform that enables the detection of high order epistatic interactions at genome wide study scale. It analyses GWAS datasets that have been prepared using standard techniques on distributed GPU instances, applying a pre-filtering step and a 5 stage automated analysis workflow: Pre-filtering optional removal of SNPs in linkage disequilibrium (LD) using LD clumping, or SNPs that are not likely to be of sufficient statistical significance for the analysis, or selection of specific SNPs and features (usually hypothesis driven) 1. Mining finding all (or most) of the distinct n-combinations of SNP genotypes and/or other types of features found in the cases but not in the controls (or vice versa if the study is focusing on protective factors) 2. Permutations repeat mining using 1,000 random permutations of all cases:controls using the same mining parameters 3. Network analysis & validation find networks, determine p-value with FDR correction to eliminate random observations 4. Network annotation annotate networks using a semantic graph containing SNP IDs, genes, pathways, druggable targets, pharmacogenetic interactions, epigenetic modifications and other features 5. Reclustering merge/correlate validated networks sharing specific features involving lifestyle factors, pathways, and others available in the specific disease datasets to test biological hypotheses interactively 2

3 Example Analysis BRCA2 and Breast Cancer After more than a decade of clinical testing of BRCA1 and BRCA2, there remains considerable uncertainty regarding cancer risks associated with inherited mutations of these genes. The variable penetrance is most striking for BRCA2, and it affects treatment decisions. Reported estimates for lifetime breast cancer risk range between 18 88% (Mavaddat N, 2013). Women with the same BRCA2 mutation may develop breast, ovarian or other cancers at different ages or not at all. In large retrospective studies, several common variants were associated with breast cancer risk for BRCA2 carriers. The effect sizes of these SNPs are small, but in specific combinations these alleles may be useful in stratifying individuals into distinct risk categories that more accurately reflect their true risk versus a generalized group average risk. We used precisionlife MARKERS to investigate high-order combinations of genetic variations that may have a potential to distinguish which female carriers of BRCA2 mutations will develop breast cancer. Input data contained the genotypes of 200,908 variants in each of 7,978 BRCA2 carriers, including 1,576 patients who had developed breast cancer before the age of 40 (cases) and 6,402 healthy subjects who had not developed breast cancer. These data were collected by the CIMBA consortium (Chenevix-Trench, et al., 2007) using an icogs genotyping array. The participants, the ethics statement, and the selection of 200,908 SNPs have previously been described in detail (Hamdi, Soucy, Pastinen, & al., 2017). Figure 1. Manhattan plot below shows p-values for over-representation of single variants in cases, based on the exact Fisher s test for single-locus associations, computed with PLINK 1.9. The variants with p-values < 10 - ⁵ are marked with the corresponding names of adjacent genes. Only FGFR2 variants satisfy the criterion for genome-wide significance (p-value < ⁸). Variants identified in networks of interacting SNPs are shown in green. With high-order correlations there is always the potential for random observations, and we take care to test for and remove these. We first run 1,000 fully randomized permutations of the mining and apply a p-value cutoff for the simple networks (p > 0.05). Further correction for multiple testing applies the Benjamini-Hochberg procedure (Benjamini Y, 1995). Analysis using a False Discovery Rate (FDR) of 5% identified 3,045 states (unique n-snp combinations) at layers (order) 5 and 7 13 that were found to differentiate breast cancer susceptibility, shown in Figure 2. The penetrance in the cohort depends on the FDR used as shown in Table 1: Table 1. Penetrance for simple networks (sets of non-redundant states) validated using different FDRs for BRCA2 dataset Figure 2. Distribution of states showing combinatorial order from 5 to 13 at FDR=5%. Size of node is proportional to the co-ocurrence of the SNPs in the state in patient cases. 3

4 Interestingly, the 3,045 states cluster (based on SNPs contained) to 533 genetically non-redundant sets. Between these there are 141,400 pairs that share at least one case, and 76,609 states sharing at least 10 cases. There are 16 commonly occurring genes (belonging to between states) associated with non-zero genotypes (i.e., containing minor alleles). Hierarchical clustering (based on co-occurrences of cases) clusters these 16 genes into five distinct groups, shown in Figure 3: The Venn diagram in Figure 4 shows the number of states in which variants corresponding to these genes co-occur. Based on this, we can extract six sets of states each containing one of the genes in the five variant groups, and a remaining set without any of them. Figure 3. Hierarchical cluster analysis of 16 most commonly occuring genes associated with non-zero genotypes. Figure 4. Venn diagram of overlap of five canonical gene groups. The states that include one or more of these six features cluster well when mapped on the first two components in Principal Component Analysis (PCA), and in the network, in which edges correspond to at least 10 common cases between two given states (Figure 6). Interestingly, all states containing none of the above genes associated to non-zero genotypes (green dots in the PCA plot) comprise only variants with zero (major allele homozygous) genotypes. Figure 5. Clustering of canonical gene groups using first two PCA dimensions. Figure 6. Cluster analysis of 16 most commonly occurring genes associated with non-zero genotypes. The genes in the six clusters are known to be associated with major breast cancer disease processes including the progression from ductal carcinoma in situ (DCIS) to invasive ductal carcinoma (IDC), Golgi-associated MT organization and stabilization, cell polarity and motility, invasion and metastasis, promotion of ERα-mediated transcription, formation of vascular networks, cancer cell survival, and determining cell fate in specific cell types. There are also multiple variants that have not previously been reported as being associated with breast cancer disease risk or processes. 4

5 Merged Network Analysis The simple networks were clustered into several merged biomarker networks using a naïve (non-hypothesis) biological criterion that the constituent simple networks contain shared SNP genotypes. The largest of these merged networks exhibited no genetic overlap with the others. The graph of this largest merged network is shown in Figure 7. Nodes correspond to SNPs, edges to co-occurrence in underlying simple networks (grey lines) or in the same haplotypes (i.e., in linkage disequilibrium, LD, with r²> 0.8, yellow lines). Distance between nodes reflects the number of simple networks in which two corresponding SNPs co-occur. Nodes with red borders are SNPs with non-zero genotypes (i.e., with at least one minor allele), and node size corresponds to the odds ratio of the SNPs. The largest nodes occur in over 1,500 states (out of 3,045 in total), but half of the SNPs are present in fewer than 10 states. Interestingly, the topology of the 5% FDR graph (grey nodes) correlates well with the clustering of simple networks at the stronger FDR threshold of 1% (blue nodes). That is, these three 1% networks are founders of different communities found at FDR 5% (see Figure 7, where SNPs found at FDR 1% are coloured blue, and the three FDR 1% networks are marked with their respective numbers, i.e., #1, #2, #3 ). Figure 7. Clustering of merged biomarker networks This graph comprises 841 SNPs, which correspond to 744 independent haplotypes (SNPs that are in approximate linkage equilibrium). As expected, almost all variants are located outside protein-coding regions (intronic, intergenic or within ncrnas, as shown in Figure 8). Figure 8. Genomic location of SNPs 5

6 Identifying Druggable Targets The vast majority of SNPs in the networks are non-coding, and annotating a GWAS SNP with an expression quantitative trait locus (eqtl) can help to highlight candidate causal genes within a locus (i.e., the eqtl target gene). The SNPs in one of the sub-networks identified within eqtl using the Genotype-Tissue Expression (GTEx) project ( are colored blue in Figure 9 below (i.e., egenes, p-value < 10 - ²). Nodes corresponding to twelve variants with p-value < 10 - ⁵ and non-zero genotypes are labelled with their respective egenes. Another way to find candidate causal variants is to identify those located in regulatory regions of the genome, such as promoters, transcription start sites (TSSs), enhancers or transcriptionally active regions of open chromatin. As these regions can be characterized by specific patterns of histone modifications and ChIP-seq enrichments, various genome-wide datasets can be used to predict those regulatory regions. The Roadmap Epigenomics Consortium (Bernstein BE, 2010) has used a variety of genome-wide methods to study the chromatin state of non-coding regions in the human genome (Roadmap Epigenomics Consortium, 2015). ChromHMM (Ernst, 2012) can integrate these chromatin datasets to discover major recurring combinatorial and spatial patterns of marks and to systematically annotate the genome. SNPs from the network (together with highly correlated variants) were annotated with precomputed ChromHMM states (15- state core model or 25-state model incorporating imputed data), based on multiple datasets available for breast mammary tissue. In Figures 9 and 10, nodes corresponding to variants within predicted promoters, enhancers, TSSs or DNasehypersensitive regions are coloured blue. Figure 9. SNPs with an eqtl Figure 10. SNPs with epigenetic marks Druggability of their protein products can be predicted, i.e., based on homology to known drug targets (kinases, receptors, proteases, etc.), or already established experimentally with various level of confidence, in vitro, or for existing drugs, through their mechanism of action. We checked both predicted and validated druggability of egenes using dgene (Kumar 2013), DrugBank (Wishart DS, 2006) and ChEMBL (Gaulton A, 2012) resources. The druggability of the targets represented in one of the smaller networks was studied in more detail. Taking into account the condition of the associated eqtl target being correlated in the breast tissue with a given SNP with p-value lower than 0.01, there were three strong potential drug targets among the SNP targets. The strongest of these belongs to a community of eight SNPs shown in Figure 11. These form a complete graph and occur in four simple networks, which in turn occur in 53 cases and 0 controls. The variant is located within the ORF of a gene coding for a druggable target that is known to be related to breast cancer metastasis potential. Figure 11. SNPs relating to druggable target 6

7 Finding Disease Protective Effects A key opportunity offered by precisionlife MARKERS is to reverse the case:controls to perform a study to identify features that are associated with disease protective effects rather than disease risks effects. Using the reversed controls:cases analysis approach on the BRCA2 population, we have successfully found a number of such protective effects, including several that may work to reduce an individual s overall lifetime risk of developing breast cancer. To perform this analysis the BRCA2 population was first segregated more stringently to enable better differentiation, particularly of potentially protective effects where BRCA2 positive people have not experienced early onset of breast cancer. The population was split such that: Cases included all non-affected people who had not developed breast cancer by the age of 55 Controls included all affected people who had developed breast cancer before the age of 40 The findings of this disease protective effect study are described below: Figure 12. Disease population segregation Study BRCA2-55 Population Number of cases (non-affected >55 years): 1,458 Number of controls (affected <40 years): 1,576 False discovery rate: 5% Table 2. SNP networks associated with significant disease protective effect using an FDR of 5% In total the protective effect factors are found in 451 out of 1,458 cases (non-affected women), giving a penetrance of 30.9%. It should be acknowledged that the size of the population was somewhat limiting for this analysis and that the arbitrary cut-off of age 55 means that some non-affected people might go on to develop the disease later in life. A separate study of BRCA2 mutation carriers suggests that approximately 72-78% of the lifetime disease risk will have been encountered by this age in the BRCA2 positive population (Kuchenbaecker, 2017). If the number of controls were larger (approaching a case:control ratio of 1:3 instead of 1:1) and segregation even more stringent, it is possible that some of these protective effects (particularly those with fewer states and cases, i.e., network 5) may become less significant and may even have failed the p-value and FDR tests. Nonetheless, for the first time, this study identified protective combinations of multiple genetic factors from a GWAS population dataset. Knowledge of the genes implicated in the protective effects can be used to identify new druggable R&D opportunities for further research. The known gene associations in Table 2 include a number of recognized druggable candidate targets. It could also be used to improve the accuracy and specificity of current disease risk scoring models and genetic tests, and be used to inform a personal disease risk scoring tool that incorporates knowledge of the impact of all the combinations of networks that a patient s genome contains. Even a patient presenting with a risky BRCA2 mutation, who would at the moment be given a high group-based disease risk score, may potentially have a personal risk significantly different from this singlelocus or even multi-snp panel test would suggest. This has not been previously demonstrated and offers a more tailored (ie. more precise) treatment option towards the goal of realizing personalized medicine. Next Steps Our next steps will include replication of these findings in larger datasets with higher resolutions SNP arrays and incorporating non-genomic data. We are also actively pursuing analysis of a variety of populations in other disease areas. 7

8 Conclusions These results of our re-analysis of the CIMBA dataset are intriguing. This breast cancer population dataset has been analyzed multiple times using conventional analytical tools and yet we have uncovered several novel and significant findings that could only be identified in the context of high-order (5-13) SNP combinations. Work continues with phenotypic / pathological data. precision MARKERS offers a powerful new lens through which to view diseases and identify new opportunities for: better patient risk scoring including disease protective effects novel target discovery and R&D directions drug repurposing and adaptive trials design For more information please contact brca2@precisionlife.com Methods Single-locus association tests were computed with PLINK using Fisher s exact test. The Manhattan plot was generated with R qqman. The results reported above refer to identified variants or their highly correlated neighbours from haplotypes. LD clumping was done using PLINK at the r² cutoff of 0.8. Genes were associated using HaploReg, taking into account genomic proximity, LD and cis-eqtl. Breast cancer associations were taken from GWAS Catalog. Gene set enrichment for associated genes was performed with DEPICT and DAVID. Regulatory elements (promoters, enhancers, TSSs) were assigned based on data from HaploReg. eqtls were assigned using ENSEMBL API ( to the GTEx data at the p-value thresholds of 0.05 and Minor allele frequency was taken from LDproxy, based on European subpopulation. Obsolete SNPs were merged according to data from dbsnp 150. Odds ratios for single variants were calculated in reference to the given SNP genotype. Druggability of the genes was assessed with PharmGKB, dgene, ChEMBL, and DrugBank resources. Visualization of networks was done using Cytoscape and in-house scripts. References Ahlqvist, E., et. al. (2018). Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes & Endocrinology S (18): Benjamini Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc., 57(1): Bernstein, B.E, et al. (2010). The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol., 28(10): Boyle, E. A., Li, Y., & Pritchard, J.K. (2017). An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell, 169(7): Chakravarti, A., & Turner, T.N. (2016). Revealing rate-limiting steps in complex disease biology: The crucial importance of studying rare, extremephenotype families. BioEssays, 38(6): Chenevix-Trench, G., Milne, R.L., Antoniou, A.C., Couch, F.J., Easton, D.F., Goldgar, D.E., & CIMBA. (2007). An international initiative to identify genetic modifiers of cancer risk in BRCA1 and BRCA2 mutation carriers: the Consortium of Investigators of Modifiers of BRCA1 and BRCA2 (CIMBA). Breast Cancer Res, 9(2):104. ENCODE Project Consortium. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489: Ernst, J., & Kellis, M. (2012). ChromHMM: automating chromatin-state discovery and characterization. Nat Methods., 9(3):215-6 Gaulton, A., et al. (2012). ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res., D GTEx Consortium (2013) The Genotype-Tissue Expression (GTEx) project. Nat Genet. 45(6): Hamdi, Y., et al. (2017). Association of breast cancer risk in BRCA1 and BRCA2 mutation carriers with genetic variants showing differential allelic expression: identification of a modifier of breast cancer risk at locus 11q22.3. Breast Cancer Res Treat., 161(1): Kuchenbaecker, K.B., et al. (2017). Risks of Breast, Ovarian and Contralateral Breast Cancer for BRCA1 and BRCA2 Mutation Carriers. JAMA 317(23): Kumar, R.D., Chang, L.W., Ellis, M.J., & Bose, R. (2013). Prioritizing Potentially Druggable Mutations with dgene: An Annotation Tool for Cancer Genome Sequencing Data. PLoS ONE., 8(6):e67980 Li, Y.I, van de Gejin, B., Raj, A., Knowles, D.A., Petti, A.A., Golan, D., Gilad, Y., Pritchard, J.K. (2016). RNA splicing is a primary link between genetic variation and disease. Science, (352): Low, S.K., Zembutsu, H., & Nakamura, Y. (2018). Breast cancer: The translation of big genomic data to cancer precision medicine. Cancer Sci., 109(3): Mavaddat, N., et al., (2015). Prediction of breast cancer risk based on profiling with common genetic variants. J Natl Cancer Inst., 107(5) Mellerup, E., Andreassen, O.A., Bennike, B., Dam, H., Djurovic, S., Jorgensen, M.B., Kessing, L.V., Koefoed, P., Melle, I., Mors, O., & Moeller, G.L. (2017) Combinations of genetic variants associated with bipolar disorder. PLoS ONE. 12(12):e Roadmap Epigenomics Consortium, et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature, 518(7539): Trynka, G.S., Sandor, C., Han, B., Xu, H., Stranger, B.E., Liu, X.S., Raychaudhuri, S. (2013). Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat Genet., 45(2): Visscher, P.M., Wray, N.R., Zhang, Q., Sklar, P., McCarthy, M.I., Brown, M.A., & Yang, J. (2017). 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet, 101(1):5-22. Wan, X., Yang. C., Yang, Q., Xue, H., Fan, X., Tang, X.L., & Yu, W. (2010). BOOST: a fast approach to detecting gene-gene interactions in genomewide case-control studies. Am J Hum Genet., 87(3): Wishart, D.S., et al. (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acid Res. 36:D USA: One Broadway, Cambridge, MA T: +1 (617) UK: C9 Glyme Court, Langford Lane, Kidlington OX5 1LQ T:

Chromatin marks identify critical cell-types for fine-mapping complex trait variants

Chromatin marks identify critical cell-types for fine-mapping complex trait variants Chromatin marks identify critical cell-types for fine-mapping complex trait variants Gosia Trynka 1-4 *, Cynthia Sandor 1-4 *, Buhm Han 1-4, Han Xu 5, Barbara E Stranger 1,4#, X Shirley Liu 5, and Soumya

More information

New Enhancements: GWAS Workflows with SVS

New Enhancements: GWAS Workflows with SVS New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences

More information

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK CHAPTER 6 DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK Genetic research aimed at the identification of new breast cancer susceptibility genes is at an interesting crossroad. On the one hand, the existence

More information

Supplementary Figure S1A

Supplementary Figure S1A Supplementary Figure S1A-G. LocusZoom regional association plots for the seven new cross-cancer loci that were > 1 Mb from known index SNPs. Genes up to 500 kb on either side of each new index SNP are

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive

More information

Introduction to the Genetics of Complex Disease

Introduction to the Genetics of Complex Disease Introduction to the Genetics of Complex Disease Jeremiah M. Scharf, MD, PhD Departments of Neurology, Psychiatry and Center for Human Genetic Research Massachusetts General Hospital Breakthroughs in Genome

More information

CS2220 Introduction to Computational Biology

CS2220 Introduction to Computational Biology CS2220 Introduction to Computational Biology WEEK 8: GENOME-WIDE ASSOCIATION STUDIES (GWAS) 1 Dr. Mengling FENG Institute for Infocomm Research Massachusetts Institute of Technology mfeng@mit.edu PLANS

More information

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

SNPrints: Defining SNP signatures for prediction of onset in complex diseases SNPrints: Defining SNP signatures for prediction of onset in complex diseases Linda Liu, Biomedical Informatics, Stanford University Daniel Newburger, Biomedical Informatics, Stanford University Grace

More information

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22. Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.32 PCOS locus after conditioning for the lead SNP rs10993397;

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017 Large-scale identity-by-descent mapping discovers rare haplotypes of large effect Suyash Shringarpure 23andMe, Inc. ASHG 2017 1 Why care about rare variants of large effect? Months from randomization 2

More information

BST227: Introduction to Statistical Genetics

BST227: Introduction to Statistical Genetics BST227: Introduction to Statistical Genetics Lecture 11: Heritability from summary statistics & epigenetic enrichments Guest Lecturer: Caleb Lareau Success of GWAS EBI Human GWAS Catalog As of this morning

More information

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles ChromHMM Tutorial Jason Ernst Assistant Professor University of California, Los Angeles Talk Outline Chromatin states analysis and ChromHMM Accessing chromatin state annotations for ENCODE2 and Roadmap

More information

Introduction to Genetics and Genomics

Introduction to Genetics and Genomics 2016 Introduction to enetics and enomics 3. ssociation Studies ggibson.gt@gmail.com http://www.cig.gatech.edu Outline eneral overview of association studies Sample results hree steps to WS: primary scan,

More information

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis BST227 Introduction to Statistical Genetics Lecture 4: Introduction to linkage and association analysis 1 Housekeeping Homework #1 due today Homework #2 posted (due Monday) Lab at 5:30PM today (FXB G13)

More information

GENOME-WIDE ASSOCIATION STUDIES

GENOME-WIDE ASSOCIATION STUDIES GENOME-WIDE ASSOCIATION STUDIES SUCCESSES AND PITFALLS IBT 2012 Human Genetics & Molecular Medicine Zané Lombard IDENTIFYING DISEASE GENES??? Nature, 15 Feb 2001 Science, 16 Feb 2001 IDENTIFYING DISEASE

More information

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017 Epigenetics Jenny van Dongen Vrije Universiteit (VU) Amsterdam j.van.dongen@vu.nl Boulder, Friday march 10, 2017 Epigenetics Epigenetics= The study of molecular mechanisms that influence the activity of

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed. Reviewers' Comments: Reviewer #1 (Remarks to the Author) The manuscript titled 'Association of variations in HLA-class II and other loci with susceptibility to lung adenocarcinoma with EGFR mutation' evaluated

More information

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different

More information

Nature Structural & Molecular Biology: doi: /nsmb.2419

Nature Structural & Molecular Biology: doi: /nsmb.2419 Supplementary Figure 1 Mapped sequence reads and nucleosome occupancies. (a) Distribution of sequencing reads on the mouse reference genome for chromosome 14 as an example. The number of reads in a 1 Mb

More information

White Paper Guidelines on Vetting Genetic Associations

White Paper Guidelines on Vetting Genetic Associations White Paper 23-03 Guidelines on Vetting Genetic Associations Authors: Andro Hsu Brian Naughton Shirley Wu Created: November 14, 2007 Revised: February 14, 2008 Revised: June 10, 2010 (see end of document

More information

LTA Analysis of HapMap Genotype Data

LTA Analysis of HapMap Genotype Data LTA Analysis of HapMap Genotype Data Introduction. This supplement to Global variation in copy number in the human genome, by Redon et al., describes the details of the LTA analysis used to screen HapMap

More information

Lecture 20. Disease Genetics

Lecture 20. Disease Genetics Lecture 20. Disease Genetics Michael Schatz April 12 2018 JHU 600.749: Applied Comparative Genomics Part 1: Pre-genome Era Sickle Cell Anaemia Sickle-cell anaemia (SCA) is an abnormality in the oxygen-carrying

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig 1. Comparison of sub-samples on the first two principal components of genetic variation. TheBritishsampleisplottedwithredpoints.The sub-samples of the diverse sample

More information

Leveraging Interaction between Genetic Variants and Mammographic Findings for Personalized Breast Cancer Diagnosis

Leveraging Interaction between Genetic Variants and Mammographic Findings for Personalized Breast Cancer Diagnosis Leveraging Interaction between Genetic Variants and Mammographic Findings for Personalized Breast Cancer Diagnosis Jie Liu, PhD 1, Yirong Wu, PhD 1, Irene Ong, PhD 1, David Page, PhD 1, Peggy Peissig,

More information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1

Nature Neuroscience: doi: /nn Supplementary Figure 1 Supplementary Figure 1 Illustration of the working of network-based SVM to confidently predict a new (and now confirmed) ASD gene. Gene CTNND2 s brain network neighborhood that enabled its prediction by

More information

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018 An Introduction to Quantitative Genetics I Heather A Lawson Advanced Genetics Spring2018 Outline What is Quantitative Genetics? Genotypic Values and Genetic Effects Heritability Linkage Disequilibrium

More information

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach) High-Throughput Sequencing Course Gene-Set Analysis Biostatistics and Bioinformatics Summer 28 Section Introduction What is Gene Set Analysis? Many names for gene set analysis: Pathway analysis Gene set

More information

Global variation in copy number in the human genome

Global variation in copy number in the human genome Global variation in copy number in the human genome Redon et. al. Nature 444:444-454 (2006) 12.03.2007 Tarmo Puurand Study 270 individuals (HapMap collection) Affymetrix 500K Whole Genome TilePath (WGTP)

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Illustrative example of ptdt using height The expected value of a child s polygenic risk score (PRS) for a trait is the average of maternal and paternal PRS values. For example,

More information

An expanded view of complex traits: from polygenic to omnigenic

An expanded view of complex traits: from polygenic to omnigenic BIRS 2017 An expanded view of complex traits: from polygenic to omnigenic How does human genetic variation drive variation in complex traits/disease risk? Yang I Li Stanford University Evan Boyle Jonathan

More information

Title: Pinpointing resilience in Bipolar Disorder

Title: Pinpointing resilience in Bipolar Disorder Title: Pinpointing resilience in Bipolar Disorder 1. AIM OF THE RESEARCH AND BRIEF BACKGROUND Bipolar disorder (BD) is a mood disorder characterised by episodes of depression and mania. It ranks as one

More information

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies 2017 Contents Datasets... 2 Protein-protein interaction dataset... 2 Set of known PPIs... 3 Domain-domain interactions...

More information

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from Supplementary Figure 1 SEER data for male and female cancer incidence from 1975 2013. (a,b) Incidence rates of oral cavity and pharynx cancer (a) and leukemia (b) are plotted, grouped by males (blue),

More information

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Supplemental Data Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Jianfu Jiang 1, Xinna Liu 1, Guotian Liu, Chonghuih Liu*, Shaohuah Li*, and Lijun

More information

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data Bioinformatics methods, models and applications to disease Alex Essebier ChIP-seq experiment To

More information

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016 Session 6: Integration of epigenetic data Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016 Utilizing complimentary datasets Frequent mutations in chromatin regulators

More information

Tutorial on Genome-Wide Association Studies

Tutorial on Genome-Wide Association Studies Tutorial on Genome-Wide Association Studies Assistant Professor Institute for Computational Biology Department of Epidemiology and Biostatistics Case Western Reserve University Acknowledgements Dana Crawford

More information

Cancer Informatics Lecture

Cancer Informatics Lecture Cancer Informatics Lecture Mayo-UIUC Computational Genomics Course June 22, 2018 Krishna Rani Kalari Ph.D. Associate Professor 2017 MFMER 3702274-1 Outline The Cancer Genome Atlas (TCGA) Genomic Data Commons

More information

Gene Ontology 2 Function/Pathway Enrichment. Biol4559 Thurs, April 12, 2018 Bill Pearson Pinn 6-057

Gene Ontology 2 Function/Pathway Enrichment. Biol4559 Thurs, April 12, 2018 Bill Pearson Pinn 6-057 Gene Ontology 2 Function/Pathway Enrichment Biol4559 Thurs, April 12, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Function/Pathway enrichment analysis do sets (subsets) of differentially expressed

More information

DIRECT IDENTIFICATION OF NEO-EPITOPES IN TUMOR TISSUE

DIRECT IDENTIFICATION OF NEO-EPITOPES IN TUMOR TISSUE DIRECT IDENTIFICATION OF NEO-EPITOPES IN TUMOR TISSUE Eustache Paramithiotis PhD Vice President, Biomarker Discovery & Diagnostics 17 March 2016 PEPTIDE PRESENTATION BY MHC MHC I Antigen presentation by

More information

# For the GWAS stage, B-cell NHL cases which small numbers (N<20) were excluded from analysis.

# For the GWAS stage, B-cell NHL cases which small numbers (N<20) were excluded from analysis. Supplementary Table 1a. Subtype Breakdown of all analyzed samples Stage GWAS Singapore Validation 1 Guangzhou Validation 2 Guangzhou Validation 3 Beijing Total No. of B-Cell Cases 253 # 168^ 294^ 713^

More information

Heritability and genetic correlations explained by common SNPs for MetS traits. Shashaank Vattikuti, Juen Guo and Carson Chow LBM/NIDDK

Heritability and genetic correlations explained by common SNPs for MetS traits. Shashaank Vattikuti, Juen Guo and Carson Chow LBM/NIDDK Heritability and genetic correlations explained by common SNPs for MetS traits Shashaank Vattikuti, Juen Guo and Carson Chow LBM/NIDDK The Genomewide Association Study. Manolio TA. N Engl J Med 2010;363:166-176.

More information

Identifying Novel Targets for Non-Small Cell Lung Cancer Just How Novel Are They?

Identifying Novel Targets for Non-Small Cell Lung Cancer Just How Novel Are They? Identifying Novel Targets for Non-Small Cell Lung Cancer Just How Novel Are They? Dubovenko Alexey Discovery Product Manager Sonia Novikova Solution Scientist September 2018 2 Non-Small Cell Lung Cancer

More information

Breast and ovarian cancer risk assessment using multigene panel tests Prof Antonis Antoniou

Breast and ovarian cancer risk assessment using multigene panel tests Prof Antonis Antoniou Breast and ovarian cancer risk assessment using multigene panel tests Prof Antonis Antoniou Department of Public Health and Primary Care University of Cambridge, U.K. Cancer risk prediction in the era

More information

The lymphoma-associated NPM-ALK oncogene elicits a p16ink4a/prb-dependent tumor-suppressive pathway. Blood Jun 16;117(24):

The lymphoma-associated NPM-ALK oncogene elicits a p16ink4a/prb-dependent tumor-suppressive pathway. Blood Jun 16;117(24): DNA Sequencing Publications Standard Sequencing 1 Carro MS et al. DEK Expression is controlled by E2F and deregulated in diverse tumor types. Cell Cycle. 2006 Jun;5(11) 2 Lassandro L et al. The DNA sequence

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. Supplementary Figure 1 Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. (a,b) Values of coefficients associated with genomic features, separately

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

Hereditary Prostate Cancer: From Gene Discovery to Clinical Implementation

Hereditary Prostate Cancer: From Gene Discovery to Clinical Implementation Hereditary Prostate Cancer: From Gene Discovery to Clinical Implementation Kathleen A. Cooney, MD MACP Duke University School of Medicine Duke Cancer Institute (No disclosures to report) Overview Prostate

More information

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data. Supplementary Figure 1 PCA for ancestry in SNV data. (a) EIGENSTRAT principal-component analysis (PCA) of SNV genotype data on all samples. (b) PCA of only proband SNV genotype data. (c) PCA of SNV genotype

More information

Research Article Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data

Research Article Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data BioMed Volume 2016, Article ID 2395341, 6 pages http://dx.doi.org/10.1155/2016/2395341 Research Article Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq

More information

Additional Disclosure

Additional Disclosure Additional Disclosure The Genetics of Prostate Cancer: Clinical Implications William J. Catalona, MD Collaborator with decode genetics, Inc. Non-paid consultant with no financial interest or support Northwestern

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Replicability of blood eqtl effects in ileal biopsies from the RISK study. eqtls detected in the vicinity of SNPs associated with IBD tend to show concordant effect size and direction

More information

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Not IN Our Genes - A Different Kind of Inheritance! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Epigenetics in Mainstream Media Epigenetics *Current definition:

More information

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis Jian Xu, Ph.D. Children s Research Institute, UTSW Introduction Outline Overview of genomic and next-gen sequencing technologies

More information

3) It is not clear to me why the authors exclude blond hair from the red hair GWAS, and blond and red hair from the brown hair GWAS.

3) It is not clear to me why the authors exclude blond hair from the red hair GWAS, and blond and red hair from the brown hair GWAS. Reviewer #1 (Remarks to the Author): The manuscript from Morgan et al. presents a fascinating in-depth look at the genetics of hair color in the UK Biobank collection. The authors examine nearly 350,000

More information

IS IT GENETIC? How do genes, environment and chance interact to specify a complex trait such as intelligence?

IS IT GENETIC? How do genes, environment and chance interact to specify a complex trait such as intelligence? 1 IS IT GENETIC? How do genes, environment and chance interact to specify a complex trait such as intelligence? Single-gene (monogenic) traits Phenotypic variation is typically discrete (often comparing

More information

Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder)

Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder) Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder) September 14, 2012 Chun Xu M.D, M.Sc, Ph.D. Assistant professor Texas Tech University Health Sciences Center Paul

More information

The epigenetic landscape of T cell subsets in SLE identifies known and potential novel drivers of the autoimmune response

The epigenetic landscape of T cell subsets in SLE identifies known and potential novel drivers of the autoimmune response Abstract # 319030 Poster # F.9 The epigenetic landscape of T cell subsets in SLE identifies known and potential novel drivers of the autoimmune response Jozsef Karman, Brian Johnston, Sofija Miljovska,

More information

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations. Supplementary Figure. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations. a Eigenvector 2.5..5.5. African Americans European Americans e

More information

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative Association mapping (qualitative) Office hours Wednesday 3-4pm 304A Stanley Hall Fig. 11.26 Association scan, qualitative Association scan, quantitative osteoarthritis controls χ 2 test C s G s 141 47

More information

Doing more with genetics: Gene-environment interactions

Doing more with genetics: Gene-environment interactions 2016 Alzheimer Disease Centers Clinical Core Leaders Meeting Doing more with genetics: Gene-environment interactions Haydeh Payami, PhD On behalf of NeuroGenetics Research Consortium (NGRC) From: Joseph

More information

The Biology and Genetics of Cells and Organisms The Biology of Cancer

The Biology and Genetics of Cells and Organisms The Biology of Cancer The Biology and Genetics of Cells and Organisms The Biology of Cancer Mendel and Genetics How many distinct genes are present in the genomes of mammals? - 21,000 for human. - Genetic information is carried

More information

Statistical Genetics. Matthew Stephens. Statistics Retreat, October 26th 2012

Statistical Genetics. Matthew Stephens. Statistics Retreat, October 26th 2012 Statistical Genetics Statistics Retreat, October 26th 2012 Two stories The two most influential statistical ideas in analysis of genetic association studies. 1 Sequence, sequence, everywhere. 1 With apologies

More information

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Kaifu Chen 1,2,3,4,5,10, Zhong Chen 6,10, Dayong Wu 6, Lili Zhang 7, Xueqiu Lin 1,2,8,

More information

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene

More information

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review

More information

Identification of heritable genetic risk factors for bladder cancer through genome-wide association studies (GWAS)

Identification of heritable genetic risk factors for bladder cancer through genome-wide association studies (GWAS) BCAN 2014 August 9, 2014 Identification of heritable genetic risk factors for bladder cancer through genome-wide association studies (GWAS) Ludmila Prokunina-Olsson, PhD Investigator Laboratory of Translational

More information

PERSONALIZED GENETIC REPORT CLIENT-REPORTED DATA PURPOSE OF THE X-SCREEN TEST

PERSONALIZED GENETIC REPORT CLIENT-REPORTED DATA PURPOSE OF THE X-SCREEN TEST INCLUDED IN THIS REPORT: REVIEW OF YOUR GENETIC INFORMATION RELEVANT TO ENDOMETRIOSIS PERSONAL EDUCATIONAL INFORMATION RELEVANT TO YOUR GENES INFORMATION FOR OBTAINING YOUR ENTIRE X-SCREEN DATA FILE PERSONALIZED

More information

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S. Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S. December 17, 2014 1 Introduction Asthma is a chronic respiratory disease affecting

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

EXPression ANalyzer and DisplayER

EXPression ANalyzer and DisplayER EXPression ANalyzer and DisplayER Tom Hait Aviv Steiner Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Adi Maron-Katz Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh

More information

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Here we compare the results of this study to potentially overlapping results from four earlier studies

More information

IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE.

IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE. IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE By Siddharth Pratap Thesis Submitted to the Faculty of the Graduate School

More information

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer risk in stage 1 (red) and after removing any SNPs within

More information

Supplementary Figure 1

Supplementary Figure 1 Supplementary Figure 1 An example of the gene-term-disease network automatically generated by Phenolyzer web server for 'autism'. The largest word represents the user s input term, Autism. The pink round

More information

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022 96 APPENDIX B. Supporting Information for chapter 4 "changes in genome content generated via segregation of non-allelic homologs" Figure S1. Potential de novo CNV probes and sizes of apparently de novo

More information

Cancer risk prediction via algorithms: identifying individuals at high-risk of breast and ovarian cancer

Cancer risk prediction via algorithms: identifying individuals at high-risk of breast and ovarian cancer Cancer risk prediction via algorithms: identifying individuals at high-risk of breast and ovarian cancer Antonis C. Antoniou Department of Public Health and Primary Care University of Cambridge, U.K. No

More information

The omics approach in measuring the double burden of malnutrition

The omics approach in measuring the double burden of malnutrition IAEA Headquarter, Vienna, Austria, 3-5 October 2017 Joint IAEA-WHO-UNICEF workshop on analysis of biological pathways to better understand the double burden of malnutrition and to inform action planning

More information

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits?

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits? corebio II - genetics: WED 25 April 2018. 2018 Stephanie Moon, Ph.D. - GWAS After this class students should be able to: 1. Compare and contrast methods used to discover the genetic basis of traits or

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Expression deviation of the genes mapped to gene-wise recurrent mutations in the TCGA breast cancer cohort (top) and the TCGA lung cancer cohort (bottom). For each gene (each pair

More information

Use of Genetics to Inform Drug Development of a Novel Treatment for Schizophrenia

Use of Genetics to Inform Drug Development of a Novel Treatment for Schizophrenia Use of Genetics to Inform Drug Development of a Novel Treatment for Schizophrenia March 21, 2012 Institute of Medicine: New Paradigms in Drug Discovery Workshop Laura K. Nisenbaum, PhD Translational Medicine,

More information

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm: The clustering problem: partition genes into distinct sets with high homogeneity and high separation Hierarchical clustering algorithm: 1. Assign each object to a separate cluster. 2. Regroup the pair

More information

Human Genetics 542 Winter 2018 Syllabus

Human Genetics 542 Winter 2018 Syllabus Human Genetics 542 Winter 2018 Syllabus Monday, Wednesday, and Friday 9 10 a.m. 5915 Buhl Course Director: Tony Antonellis Jan 3 rd Wed Mapping disease genes I: inheritance patterns and linkage analysis

More information

The genetics of complex traits Amazing progress (much by ppl in this room)

The genetics of complex traits Amazing progress (much by ppl in this room) The genetics of complex traits Amazing progress (much by ppl in this room) Nick Martin Queensland Institute of Medical Research Brisbane Boulder workshop March 11, 2016 Genetic Epidemiology: Stages of

More information

PRECISION INSIGHTS. GPS Cancer. Molecular Insights You Can Rely On. Tumor-normal sequencing of DNA + RNA expression.

PRECISION INSIGHTS. GPS Cancer. Molecular Insights You Can Rely On. Tumor-normal sequencing of DNA + RNA expression. PRECISION INSIGHTS GPS Cancer Molecular Insights You Can Rely On Tumor-normal sequencing of DNA + RNA expression www.nanthealth.com Cancer Care is Evolving Oncologists use all the information available

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

Human Genetics 542 Winter 2017 Syllabus

Human Genetics 542 Winter 2017 Syllabus Human Genetics 542 Winter 2017 Syllabus Monday, Wednesday, and Friday 9 10 a.m. 5915 Buhl Course Director: Tony Antonellis Module I: Mapping and characterizing simple genetic diseases Jan 4 th Wed Mapping

More information

GWAS of HCC Proposed Statistical Approach Mendelian Randomization and Mediation Analysis. Chris Amos Manal Hassan Lewis Roberts Donghui Li

GWAS of HCC Proposed Statistical Approach Mendelian Randomization and Mediation Analysis. Chris Amos Manal Hassan Lewis Roberts Donghui Li GWAS of HCC Proposed Statistical Approach Mendelian Randomization and Mediation Analysis Chris Amos Manal Hassan Lewis Roberts Donghui Li Overall Design of GWAS Study Aim 1 (DISCOVERY PHASE): To genotype

More information

Transcript-indexed ATAC-seq for immune profiling

Transcript-indexed ATAC-seq for immune profiling Transcript-indexed ATAC-seq for immune profiling Technical Journal Club 22 nd of May 2018 Christina Müller Nature Methods, Vol.10 No.12, 2013 Nature Biotechnology, Vol.32 No.7, 2014 Nature Medicine, Vol.24,

More information

Genetics and Genomics in Medicine Chapter 8 Questions

Genetics and Genomics in Medicine Chapter 8 Questions Genetics and Genomics in Medicine Chapter 8 Questions Linkage Analysis Question Question 8.1 Affected members of the pedigree above have an autosomal dominant disorder, and cytogenetic analyses using conventional

More information

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Marilou Wijdicks International Product Manager Research For Life Science Research Only. Not for Use in Diagnostic Procedures.

More information

Pirna Sequence Variants Associated With Prostate Cancer In African Americans And Caucasians

Pirna Sequence Variants Associated With Prostate Cancer In African Americans And Caucasians Yale University EliScholar A Digital Platform for Scholarly Publishing at Yale Public Health Theses School of Public Health January 2015 Pirna Sequence Variants Associated With Prostate Cancer In African

More information

ChARM: Discovery of combinatorial chromatin modification patterns in hepatitis B virus X-transformed mouse liver cancer using association rule mining

ChARM: Discovery of combinatorial chromatin modification patterns in hepatitis B virus X-transformed mouse liver cancer using association rule mining The Author(s) BMC Bioinformatics 2016, 17(Suppl 16):452 DOI 10.1186/s12859-016-1307-z RESEARCH ChARM: Discovery of combinatorial chromatin modification patterns in hepatitis B virus X-transformed mouse

More information

Human population sub-structure and genetic association studies

Human population sub-structure and genetic association studies Human population sub-structure and genetic association studies Stephanie A. Santorico, Ph.D. Department of Mathematical & Statistical Sciences Stephanie.Santorico@ucdenver.edu Global Similarity Map from

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed.

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed. Supplementary Note The potential association and implications of HBV integration at known and putative cancer genes of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed. Human telomerase

More information