TADA: Analyzing De Novo, Transmission and Case-Control Sequencing Data
|
|
- Augustus Cunningham
- 5 years ago
- Views:
Transcription
1 TADA: Analyzing De Novo, Transmission and Case-Control Sequencing Data Each person inherits mutations from parents, some of which may predispose the person to certain diseases. Meanwhile, new mutations may occur spontaneously during the reproductive process, and if disrupting key genes, such de novo mutations may increase risks of disease. TADA (Transmission And De novo Association test) is a Bayesian model that effectively combines data from de novo mutations, inherited variants in families, and standing variants in the population (identified with case-control studies). This approach significantly increases the power of gene discovery, as we demonstrated through the studies of exome sequencing data of Autism Spectrum Disorder (ASD). Website: Author: Xin He <xinhe2@gmail.com> Lane Center of Computational Biology, Carnegie Mellon University Reference: Integrated Model of De Novo and Inherited Genetic Variants Yields Greater Power to Identify Risk Genes, Xin He, et al., PLoS Genetics, 2013 TADA-Denovo: It is possible to use TADA to analyze only the de novo mutations from exome sequencing data. This would make it considerably easier to run the analysis: easier to parameterize the program and much faster. We create a specialized version of TADA for this purpose, and call it TADA-Denovo. Below we describe the use of TADA and TADA-Denovo in two separate sections, and you can decide which program best suits your need. The files in the package includes: TADA.R: R functions of TADA. TADA_demo.R: R code demonstrating the use of TADA, using the data of Autism Spectrum Disorder (ASD). TADA_denovo.pdf: explains the advantages of using TADA-Denovo for analyzing de novo mutations. TADA_denovo _demo.r: R code demonstrating the use of TADA-Denovo. ASC_2231trios_1333trans_1601cases_5397controls.csv: the ASD data used for the demonstration code. known_asd_genes.csv: a short list of 20 published ASD genes. TADA_results.csv: the results of running TADA on the ASD data. TADA_denovo_results.csv: the results of running TADA-Denovo on the ASD data. Background In this section, we explain some background you need to understand to use the software. Note that if you plan to use TADA.Denovo only, you can skip the explanations in this section about variant counts in the transmission and case/control data. Variant collapsing and categories:
2 In TADA, all mutations/variants of a given type (e.g. loss-of-function or LoF) of a gene are collapsed, and are effectively treated as a single variant. So we can talk about the relative risk (called gamma in the model) and allele frequency (called q) of this variant. TADA generally considers two types of variants: LoF and missense. In our experiments, we further restrict to those missense variants that are predicted to be "probably-damaging" to the protein function by PolyPhen 2 (denoted as mis3 variants). Variant counts: The main input of TADA function (see below) is the variant counts of a gene to be tested. For LoF variants, the counts of any gene should have three numbers: the number of de novo LoF mutations in trios, the number of LoF variants in cases and the number of LoF variants in controls. The counts of transmission data are readily added in TADA. Basically, the number of transmitted variants is treated the same as that of cases (add to the case count), and similarly, the number of nontransmitted variants is treated as controls (add to the control count). If you do not have transmission data, simply ignore them. In the sample file, ASC_2231trios_1333trans_1601cases_5397controls.csv, each row provides the counts of one gene. The columns are named: dn.lof, case.lof, ctrl.lof. If you have transmission data, before calling TADA function, the number of transmitted alleles and case count should be combined, and similarly, the non-transmitted count and the control count should be combined. The sample size needs to be modified accordingly. TADA-Denovo When one only has de novo mutations from family data, TADA-Denovo is the program to use. The simple approach of analyzing de novo data is the Poison test on the number of de novo mutations in a gene (comparing with the expected number based on the estimated mutation rate). The main benefit of TADA-Denovo is that it can take advantage of the functional annotations of the mutations, for example, a de novo nonsense mutation will be weighted more than a de novo missense mutation. We explain the rationale and the model details of TADA-Denovo in the file, TADA_denovo.pdf. We include in this package some code that illustrates the use of TADA-Denovo. Please see the file TADA_denovo_demo.R. Running TADA-Denovo In the section Application of TADA-Denovo of the demo file, we compute Bayes Factors (BFs) and p- values of a set of genes. This code can be slightly modified for your analysis. The main function is: TADA.denovo(counts, N, mu, mu.frac, gamma.mean, beta) counts: the count data, an m x K matrix, where m is the number of genes, and K is the number of mutational categories. counts[i,j] is the number of de novo mutation in the j-th category of the i-th gene. N: the sample size, i.e. the number of families. mu: the mutation rates of genes (m-dimensional vector). mu.frac: a K-dimensional vector, an element of this vector is multiplied to the gene-level mutation rate to obtain the mutation rate of a specific mutational category. gamma.mean: the mean relative risks (RR), one value per mutational category. beta: the other parameter of the RR distribution. The RR of a gene follows the Gamma distribution: Gamma(gamma.mean*beta, beta).
3 The results of running this function are the BFs of all input genes, in exactly the same order. It is possible to obtain the p-values, though we recommend the Bayesian FDR control procedure described below. The function TADAp.denovo(counts, N, mu, mu.frac, gamma.mean, beta, l=100) computes the p-values by generating random mutational data. In other words, for each gene, we use its mutation rate to sample the number of de novo mutations in this gene, assuming it is not a susceptibility gene. This sampling procedure is repeated l times, and we apply TADA-Denovo to the sampled data to obtain the null distribution of BFs. Typically l = 100 should be sufficient for whole exome sequencing data. The minimum p-value that can be obtained is approximately 1/( ) = (assuming a total of 20,000 human genes). To control for FDR, we use a Bayesian approach, called Direct Posterior Approach [1], which determines the threshold of BFs at a given FDR. We provide code in the software for the convenience of users: Bayesian.FDR(BF, pi0) BF: BFs sorted in the decreasing order. pi0: the prior probability that the null hypothesis is true. The results (in the field FDR ) are the q-values of the input BFs, in the same order. Model parameterization The section Estimation of de novo parameters using Method of Moment approach of the demo file explains how a user could set the parameters of TADA-Denovo. First, the mutation rate of a gene is defined as the total single nucleotide substitution rate. The mutation rates of the input genes should be provided in the input file. In our analysis of ASD data, the mutation rates of all human genes were based on [2]. Of course the users could obtain the rates from some other resources. In addition, since TADA works on each type of mutation (LoF or missense) separately, we need to specify the rate of each type of mutation, as a fraction of the total gene-level mutation rate. In our analysis of ASD data, we use the number of de novo mutations in a control dataset (unaffected siblings) to obtain these relative fractions (see the Methods section of our paper). For LoF mutations, this is of the total gene mutation rate, and for mis3 (probably damaging mutations predicted by PolyPhen), this is 0.32 of the gene mutation rate. Next, we estimate the two parameters related to the RR, gamma.mean and beta, for each variant category. This is explained in the demo code, and we encourage the users to read TADA_denovo.pdf for the details of how they should be estimated. The key function is: denovo.mom(n, mu, C, beta, k) N: the number of families. C: the observed number of de novo mutations (for a given category). beta: the beta parameter of the RR distribution. k: the number of susceptibility genes.
4 The results of this function are: the expected number of genes with more than one de novo function in the given category, or simply multiple-hit genes (the field M ), and the mean relative risk for the given parameters (the field gamma.mean ). The basic strategy of parameter estimation is to run this function at different values of k to choose a value that minimizes the difference between the expected number of multiple-hit genes and the observed number. Finally, we would also need the value of pi0, the prior probability that the null hypothesis is true. This simply follows from the previous step that estimates k, the number of susceptibility genes. The value of k divided by the total number of genes gives (1-pi0). Note that pi0 only needs to be estimated once, for LoF mutations. Simulation In the section Simulation to assess the power of TADA.denovo of the demo code, we illustrate how to use simulation to do power analysis. The main function is: eval.tada.denovo(n, mu, mu.frac, pi, gamma.mean, beta, gamma.mean.est, best.est, FDR=0.1) N: the number of families. mu.frac: the constants multiplied to the total mutation rates. pi: the fraction of susceptibility genes. gamma.mean, beta: the parameters of the RR distribution used in generating the simulation data. gamma.mean.est, beta.est: the parameters used by the TADA-Denovo function. FDR: the desired FDR level. The function returns the expected number of discoveries at the given FDR level. TADA When one has both de novo mutations and inherited data (either from transmitted variants called from sequencing data of families, or from case-control data, or both), TADA is able to take advantage of all the data. We encourage the users to read the section on TADA-Denovo first, as a number of points will be shared between the two, and we believe it s always good to run TADA-Denovo first even if one has the full data. Our experience is that the de novo data is generally more reliable and informative than the inherited data, probably because (1) the de novo mutations tend to have higher relative risks; (2) the case-control data is susceptible to population stratification. We include in this package some code that illustrates the use of TADA. Please see the file TADA_ demo.r. Running TADA The section Application of TADA in the demo file illustrates how to use TADA to obtain BFs of a given set of genes. The main function is: TADA(counts, N, mu, mu.frac, hyperpar)
5 counts: m x 3K matrix, where m is the number of gene, and K is the number of variant categories. Each category has three numbers: de novo, case and control. N: sample sizes, with three values for de novo, case and control, respectively. mu.frac: a K-dimensional vector, an element of this vector is multiplied to the gene-level mutation rate to obtain the mutation rate of a specific mutational category. hyperpar: 8 x K matrix, where each row is a vector of 8 parameters: (gamma.mean.dn, beta.dn, gamma.mean.cc, beta.cc, rho1, nu1, rho0, nu0), and each column corresponds to one variant category. The eight parameters are: gamma.mean.dn, beta.dn: the parameters of the RR distribution of de novo mutations. The RR of a de novo mutation in a given category follows the distribution: Gamma(gamma.mean.dn*beta.dn, beta.dn). gamma.mean.cc, beta.cc: the parameters of the RR distribution of inherited variants, similar to the de novo parameters defined above. rho1, nu1: the parameters of the q (the frequency of a certain type of variants) distribution under the alternative model (the gene is a risk gene). The prior distribution Gamma(rho1, nu1). rho0, nu0: the parameters of the q distribution under the null model. The results of running this function are the BFs of all input genes, in the same order. The FDR control can be implemented using a Bayesian procedure as explained before. To obtain p-values, we could use a function TADAp(counts, N, mu, mu.frac, hyperpar, l=100). This is similar to the TADAp.denovo() function described in the previous section, except that we also generate randomized inherited data (equivalent to permutation of case-control labels) in addition to randomized de novo mutations. See the relevant part in the previous section about TADA-Denovo. Model parameterization The section Estimation of parameters of the prior distributions of the demo file explains how a user could set the parameters of TADA. Also please read the section of Transmission And De novo Association test (TADA) in the Supplement of our paper (to be added). For the parameter related to de novo mutations, we refer the users to the relevant part of the TADA-Denovo section above. For the RR parameters of the inherited variants, we assume a set of genes known to be involved in the disease of interest is available. Then we simply use the fold-enrichment of the variants in cases vs. controls as the approximate mean RR (gamma.mean.cc). The method is generally not very sensitive to the parameter beta.cc, so we suggest to choose a value so that the prior RR distribution falls in a reasonable range (e.g. most probability mass would be greater than 1, but less than 5). However, if there is no evidence that the inherited variants of a certain category are enriched in cases over controls for the known risk genes (or evidence of transmission disequilibrium), we suggest to simply ignore this type of variants, by setting gamma.mean.cc=1, and beta.cc=1000 (some arbitrarily large number). For the prior parameters of q, we suggest to estimate the mean frequency of a variant category, and this would be equal to the value of rho1/nu1 and rho0/nu0 (we assume they are equal). Then we choose nu1 or nu0 to be some numbers small relative to the sample size, e.g. 100 or 200.
6 Simulation In the section Simulation to assess the power of TADA. of the demo code, we illustrate how to use simulation to do power analysis. The main function is: eval.tada(n, mu, mu.frac, pi, gamma.mean.dn, beta.dn, gamma.mean.cc, beta.cc, rho1, nu1, rho0, nu0, hyperpar.est, FDR=0.1, tradeoff=true) N: the number of families. mu.frac: the constants multiplied to the total mutation rates. pi: the fraction of susceptibility genes. gamma.mean.dn, beta.dn: the parameters of the RR distribution of de novo mutations used in generating the simulation data. gamma.mean.cc, beta.cc: the parameters of the RR distribution of inherited variants used in generating the simulation data. rho1, nu1, rho0, nu0: the parameters of the q (the frequency of a certain type of variants) distribution. hyperpar.est: the parameters used by the TADA function on the simulated data. FDR: the desired FDR level. tradeoff: whether implements the relationship between q and RR during simulation (i.e. if variants have higher RR, their frequency is likely low). Recommended to be TRUE. See the section of Transmission And De novo Association test (TADA) in the Supplement of our paper. Reference 1. Newton, M.A., et al., Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, (2): p Sanders, S.J., et al., De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature, (7397): p
Integrated Model of De Novo and Inherited Genetic Variants Yields Greater Power to Identify Risk Genes
Integrated Model of De Novo and Inherited Genetic Variants Yields Greater Power to Identify Risk Genes Xin He 1, Stephan J. Sanders 2, Li Liu 3, Silvia De Rubeis 4,5, Elaine T. Lim 6,7, James S. Sutcliffe
More informationRare Variant Burden Tests. Biostatistics 666
Rare Variant Burden Tests Biostatistics 666 Last Lecture Analysis of Short Read Sequence Data Low pass sequencing approaches Modeling haplotype sharing between individuals allows accurate variant calls
More informationNature Genetics: doi: /ng Supplementary Figure 1
Supplementary Figure 1 Illustrative example of ptdt using height The expected value of a child s polygenic risk score (PRS) for a trait is the average of maternal and paternal PRS values. For example,
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature13908 Supplementary Tables Supplementary Table 1: Families in this study (.xlsx) All families included in the study are listed. For each family, we show: the genders of the probands and
More informationComments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al.
Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al. Holger Höfling Gad Getz Robert Tibshirani June 26, 2007 1 Introduction Identifying genes that are involved
More informationComputational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project
Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene
More informationVariant Detection & Interpretation in a diagnostic context. Christian Gilissen
Variant Detection & Interpretation in a diagnostic context Christian Gilissen c.gilissen@gen.umcn.nl 28-05-2013 So far Sequencing Johan den Dunnen Marja Jakobs Ewart de Bruijn Mapping Victor Guryev Variant
More informationNature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.
Supplementary Figure 1 PCA for ancestry in SNV data. (a) EIGENSTRAT principal-component analysis (PCA) of SNV genotype data on all samples. (b) PCA of only proband SNV genotype data. (c) PCA of SNV genotype
More informationStatistical power and significance testing in large-scale genetic studies
STUDY DESIGNS Statistical power and significance testing in large-scale genetic studies Pak C. Sham 1 and Shaun M. Purcell 2,3 Abstract Significance testing was developed as an objective method for summarizing
More informationAnalysis with SureCall 2.1
Analysis with SureCall 2.1 Danielle Fletcher Field Application Scientist July 2014 1 Stages of NGS Analysis Primary analysis, base calling Control Software FASTQ file reads + quality 2 Stages of NGS Analysis
More informationVariant Classification. Author: Mike Thiesen, Golden Helix, Inc.
Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets
More informationNature Methods: doi: /nmeth.3115
Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by
More informationUsing large-scale human genetic variation to inform variant prioritization in neuropsychiatric disorders
Using large-scale human genetic variation to inform variant prioritization in neuropsychiatric disorders Kaitlin E. Samocha Hurles lab, Wellcome Trust Sanger Institute ACGS Summer Scientific Meeting 27
More informationA Quick-Start Guide for rseqdiff
A Quick-Start Guide for rseqdiff Yang Shi (email: shyboy@umich.edu) and Hui Jiang (email: jianghui@umich.edu) 09/05/2013 Introduction rseqdiff is an R package that can detect differential gene and isoform
More informationStrength of functional signature correlates with effect size in autism
Ballouz and Gillis Genome Medicine (217) 9:64 DOI 1.1186/s1373-17-455-8 RESEARCH Open Access Strength of functional signature correlates with effect size in autism Sara Ballouz and Jesse Gillis * Abstract
More informationSequencing studies implicate inherited mutations in autism
NEWS Sequencing studies implicate inherited mutations in autism BY EMILY SINGER 23 JANUARY 2013 1 / 5 Unusual inheritance: Researchers have found a relatively mild mutation in a gene linked to Cohen syndrome,
More informationMath Released Item Grade 3. Find the Area and Identify Equal Areas 1749-M23082
Math Released Item 2018 Grade 3 Find the Area and Identify Equal Areas 1749-M23082 Anchor Set A1 A6 With Annotations Prompt 1749-M23082 Rubric Part A Score Description 1 This part of the item is machine
More informationcaspa Comparison and Analysis of Special Pupil Attainment
caspa Comparison and Analysis of Special Pupil Attainment Analysis and bench-marking in CASPA This document describes of the analysis and bench-marking features in CASPA and an explanation of the analysis
More informationTutorial on Genome-Wide Association Studies
Tutorial on Genome-Wide Association Studies Assistant Professor Institute for Computational Biology Department of Epidemiology and Biostatistics Case Western Reserve University Acknowledgements Dana Crawford
More informationSISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers
SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington
More informationPackage CancerMutationAnalysis
Type Package Package CancerMutationAnalysis Title Cancer mutation analysis Version 1.2.1 Author Giovanni Parmigiani, Simina M. Boca March 25, 2013 Maintainer Simina M. Boca Imports
More informationNature Neuroscience: doi: /nn Supplementary Figure 1
Supplementary Figure 1 Illustration of the working of network-based SVM to confidently predict a new (and now confirmed) ASD gene. Gene CTNND2 s brain network neighborhood that enabled its prediction by
More informationMetabolomic Data Analysis with MetaboAnalyst
Metabolomic Data Analysis with MetaboAnalyst User ID: guest6501 April 16, 2009 1 Data Processing and Normalization 1.1 Reading and Processing the Raw Data MetaboAnalyst accepts a variety of data types
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature13772 Supplementary Methods Samples The goal of the ASC 1 is to leverage all existing and ongoing whole exome studies, as well as whole genome sequencing studies as they become available.
More informationStatistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012
Statistical Tests for X Chromosome Association Study with Simulations Jian Wang July 10, 2012 Statistical Tests Zheng G, et al. 2007. Testing association for markers on the X chromosome. Genetic Epidemiology
More informationDesign for Targeted Therapies: Statistical Considerations
Design for Targeted Therapies: Statistical Considerations J. Jack Lee, Ph.D. Department of Biostatistics University of Texas M. D. Anderson Cancer Center Outline Premise General Review of Statistical Designs
More informationPackage BUScorrect. September 16, 2018
Type Package Package BUScorrect September 16, 2018 Title Batch Effects Correction with Unknown Subtypes Version 0.99.12 Date 2018-06-07 Author , Yingying Wei Maintainer
More informationTitle: Prediction of HIV-1 virus-host protein interactions using virus and host sequence motifs
Author's response to reviews Title: Prediction of HIV-1 virus-host protein interactions using virus and host sequence motifs Authors: Perry PE Evans (evansjp@mail.med.upenn.edu) Will WD Dampier (wnd22@drexel.edu)
More informationDan Koller, Ph.D. Medical and Molecular Genetics
Design of Genetic Studies Dan Koller, Ph.D. Research Assistant Professor Medical and Molecular Genetics Genetics and Medicine Over the past decade, advances from genetics have permeated medicine Identification
More informationAscertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies
Behav Genet (2007) 37:631 636 DOI 17/s10519-007-9149-0 ORIGINAL PAPER Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies Manuel A. R. Ferreira
More informationHow many disease-causing variants in a normal person? Matthew Hurles
How many disease-causing variants in a normal person? Matthew Hurles Summary What is in a genome? What is normal? Depends on age What is a disease-causing variant? Different classes of variation Final
More informationIntroduction to the Genetics of Complex Disease
Introduction to the Genetics of Complex Disease Jeremiah M. Scharf, MD, PhD Departments of Neurology, Psychiatry and Center for Human Genetic Research Massachusetts General Hospital Breakthroughs in Genome
More informationDe novo mutational profile in RB1 clarified using a mutation rate modeling algorithm
Aggarwala et al. BMC Genomics (2017) 18:155 DOI 10.1186/s12864-017-3522-z RESEARCH ARTICLE Open Access De novo mutational profile in RB1 clarified using a mutation rate modeling algorithm Varun Aggarwala
More informationUser Guide. Association analysis. Input
User Guide TFEA.ChIP is a tool to estimate transcription factor enrichment in a set of differentially expressed genes using data from ChIP-Seq experiments performed in different tissues and conditions.
More informationA Case Study: Two-sample categorical data
A Case Study: Two-sample categorical data Patrick Breheny January 31 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/43 Introduction Model specification Continuous vs. mixture priors Choice
More informationPackage AbsFilterGSEA
Type Package Package AbsFilterGSEA September 21, 2017 Title Improved False Positive Control of Gene-Permuting GSEA with Absolute Filtering Version 1.5.1 Author Sora Yoon Maintainer
More informationFor general queries, contact
Much of the work in Bayesian econometrics has focused on showing the value of Bayesian methods for parametric models (see, for example, Geweke (2005), Koop (2003), Li and Tobias (2011), and Rossi, Allenby,
More information38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16
38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and
More informationCongenital Heart Disease How much of it is genetic?
Congenital Heart Disease How much of it is genetic? Stephen Robertson Curekids Professor of Paediatric Genetics Dunedin School of Medicine University of Otago Congenital Heart Disease The most common survivable
More informationThe University of Texas MD Anderson Cancer Center Division of Quantitative Sciences Department of Biostatistics. CRM Suite. User s Guide Version 1.0.
The University of Texas MD Anderson Cancer Center Division of Quantitative Sciences Department of Biostatistics CRM Suite User s Guide Version 1.0.0 Clift Norris, John Venier, Ying Yuan, and Lin Zhang
More information1 in 68 in US. Autism Update: New research, evidence-based intervention. 1 in 45 in NJ. Selected New References. Autism Prevalence CDC 2014
Autism Update: New research, evidence-based intervention Martha S. Burns, Ph.D. Joint Appointment Professor Northwestern University. 1 Selected New References Bourgeron, Thomas (2015) From the genetic
More informationModule Overview. What is a Marker? Part 1 Overview
SISCR Module 7 Part I: Introduction Basic Concepts for Binary Classification Tools and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington
More informationLTA Analysis of HapMap Genotype Data
LTA Analysis of HapMap Genotype Data Introduction. This supplement to Global variation in copy number in the human genome, by Redon et al., describes the details of the LTA analysis used to screen HapMap
More informationTypes of Modifications
Modifications 1 Types of Modifications Post-translational Phosphorylation, acetylation Artefacts Oxidation, acetylation Derivatisation Alkylation of cysteine, ICAT, SILAC Sequence variants Errors, SNP
More informationSupplementary Information. Data Identifies FAN1 at 15q13.3 as a Susceptibility. Gene for Schizophrenia and Autism
Supplementary Information A Scan-Statistic Based Analysis of Exome Sequencing Data Identifies FAN1 at 15q13.3 as a Susceptibility Gene for Schizophrenia and Autism Iuliana Ionita-Laza 1,, Bin Xu 2, Vlad
More informationMULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES
24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter
More informationCITATION FILE CONTENT/FORMAT
CITATION For any resultant publications using please cite: Matthew A. Field, Vicky Cho, T. Daniel Andrews, and Chris C. Goodnow (2015). "Reliably detecting clinically important variants requires both combined
More informationRefining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples
Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples Jack A. Kosmicki, Massachusetts General Hospital Kaitlin E. Samocha, Massachusetts
More informationA Comparison of Collaborative Filtering Methods for Medication Reconciliation
A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,
More informationIMPaLA tutorial.
IMPaLA tutorial http://impala.molgen.mpg.de/ 1. Introduction IMPaLA is a web tool, developed for integrated pathway analysis of metabolomics data alongside gene expression or protein abundance data. It
More informationJournal: Nature Methods
Journal: Nature Methods Article Title: Network-based stratification of tumor mutations Corresponding Author: Trey Ideker Supplementary Item Supplementary Figure 1 Supplementary Figure 2 Supplementary Figure
More informationClustering Autism Cases on Social Functioning
Clustering Autism Cases on Social Functioning Nelson Ray and Praveen Bommannavar 1 Introduction Autism is a highly heterogeneous disorder with wide variability in social functioning. Many diagnostic and
More informationSupplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first
Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first intron IGLL5 mutation depicting biallelic mutations. Red arrows highlight the presence of out of phase
More informationPopulation Genetics Simulation Lab
Name Period Assignment # Pre-lab: annotate each paragraph Population Genetics Simulation Lab Evolution occurs in populations of organisms and involves variation in the population, heredity, and differential
More informationPackage xseq. R topics documented: September 11, 2015
Package xseq September 11, 2015 Title Assessing Functional Impact on Gene Expression of Mutations in Cancer Version 0.2.1 Date 2015-08-25 Author Jiarui Ding, Sohrab Shah Maintainer Jiarui Ding
More informationA Likelihood-Based Framework for Variant Calling and De Novo Mutation Detection in Families
A Likelihood-Based Framework for Variant Calling and De Novo Mutation Detection in Families Bingshan Li 1 *, Wei Chen 2, Xiaowei Zhan 3, Fabio Busonero 3,4, Serena Sanna 4, Carlo Sidore 4, Francesco Cucca
More informationPSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5
PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science Homework 5 Due: 21 Dec 2016 (late homeworks penalized 10% per day) See the course web site for submission details.
More informationIntegrated Bayesian analysis of rare exonic variants to identify risk genes for schizophrenia and neurodevelopmental disorders
Nguyen et al. Genome Medicine (217) 9:114 DOI 1.1186/s1373-17-497-y RESEARCH Open Access Integrated Bayesian analysis of rare exonic variants to identify risk genes for schizophrenia and neurodevelopmental
More informationIntroduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder
Introduction to linkage and family based designs to study the genetic epidemiology of complex traits Harold Snieder Overview of presentation Designs: population vs. family based Mendelian vs. complex diseases/traits
More informationReporting TP53 gene analysis results in CLL
Reporting TP53 gene analysis results in CLL Mutations in TP53 - From discovery to clinical practice in CLL Discovery Validation Clinical practice Variant diversity *Leroy at al, Cancer Research Review
More informationSubLasso:a feature selection and classification R package with a. fixed feature subset
SubLasso:a feature selection and classification R package with a fixed feature subset Youxi Luo,3,*, Qinghan Meng,2,*, Ruiquan Ge,2, Guoqin Mai, Jikui Liu, Fengfeng Zhou,#. Shenzhen Institutes of Advanced
More informationIntroduction to Bayesian Analysis 1
Biostats VHM 801/802 Courses Fall 2005, Atlantic Veterinary College, PEI Henrik Stryhn Introduction to Bayesian Analysis 1 Little known outside the statistical science, there exist two different approaches
More informationPackage cssam. February 19, 2015
Type Package Package cssam February 19, 2015 Title cssam - cell-specific Significance Analysis of Microarrays Version 1.2.4 Date 2011-10-08 Author Shai Shen-Orr, Rob Tibshirani, Narasimhan Balasubramanian,
More informationNaïve Bayes classification in R
Big-data Clinical Trial Column age 1 of 5 Naïve Bayes classification in R Zhongheng Zhang Department of Critical Care Medicine, Jinhua Municipal Central Hospital, Jinhua Hospital of Zhejiang University,
More informationAnswers to end of chapter questions
Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are
More informationResearch Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process
Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social
More informationBurning debate: What s the best way to nab real autism genes?
OPINION, VIEWPOINT Burning debate: What s the best way to nab real autism genes? BY BRIAN O'ROAK 27 JUNE 2017 Over the past 10 years researchers have made tremendous progress in understanding the genetic
More informationMediation Analysis With Principal Stratification
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 3-30-009 Mediation Analysis With Principal Stratification Robert Gallop Dylan S. Small University of Pennsylvania
More informationEpigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017
Epigenetics Jenny van Dongen Vrije Universiteit (VU) Amsterdam j.van.dongen@vu.nl Boulder, Friday march 10, 2017 Epigenetics Epigenetics= The study of molecular mechanisms that influence the activity of
More informationHands-On Ten The BRCA1 Gene and Protein
Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such
More information1. Create a mutation rate table from intergenic SNPs for all possible trinucleotide to trinucleotide changes
1. Create a mutation rate table from intergenic SNPs for all possible trinucleotide to trinucleotide changes ATCGGCTGG ATCGACTGG CCTAGCTAA CCTGGCTAA CTCACCGGA CTCACTGGA Change AAA ACA AAA AGA AAA ATA AAC
More informationPractical Bayesian Design and Analysis for Drug and Device Clinical Trials
Practical Bayesian Design and Analysis for Drug and Device Clinical Trials p. 1/2 Practical Bayesian Design and Analysis for Drug and Device Clinical Trials Brian P. Hobbs Plan B Advisor: Bradley P. Carlin
More information4. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2017 4. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More informationChapter 8: Two Dichotomous Variables
Chapter 8: Two Dichotomous Variables On the surface, the topic of this chapter seems similar to what we studied in Chapter 7. There are some subtle, yet important, differences. As in Chapter 5, we have
More informationPSSV User Manual (V2.1)
PSSV User Manual (V2.1) 1. Introduction A novel pattern-based probabilistic approach, PSSV, is developed to identify somatic structural variations from WGS data. Specifically, discordant and concordant
More informationTransmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004
Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004 Correspondence to: Daniel J. Schaid, Ph.D., Harwick 775, Division of Biostatistics Mayo Clinic/Foundation,
More informationQuantitative genetics: traits controlled by alleles at many loci
Quantitative genetics: traits controlled by alleles at many loci Human phenotypic adaptations and diseases commonly involve the effects of many genes, each will small effect Quantitative genetics allows
More informationNature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.
Supplementary Figure 1 Mutational signatures in BCC compared to melanoma. (a) The effect of transcription-coupled repair as a function of gene expression in BCC. Tumor type specific gene expression levels
More informationGene Expression Analysis Web Forum. Jonathan Gerstenhaber Field Application Specialist
Gene Expression Analysis Web Forum Jonathan Gerstenhaber Field Application Specialist Our plan today: Import Preliminary Analysis Statistical Analysis Additional Analysis Downstream Analysis 2 Copyright
More informationBayesian Prediction Tree Models
Bayesian Prediction Tree Models Statistical Prediction Tree Modelling for Clinico-Genomics Clinical gene expression data - expression signatures, profiling Tree models for predictive sub-typing Combining
More informationMutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research
Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,
More informationSection 6: Analysing Relationships Between Variables
6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations
More informationField wide development of analytic approaches for sequence data
Benjamin Neale Field wide development of analytic approaches for sequence data Cohort Allelic Sum Test (CAST; Hobbs, Cohen and others) Li and Leal (AJHG) Madsen and Browning (PLoS Genetics) C alpha and
More informationLecture 20. Disease Genetics
Lecture 20. Disease Genetics Michael Schatz April 12 2018 JHU 600.749: Applied Comparative Genomics Part 1: Pre-genome Era Sickle Cell Anaemia Sickle-cell anaemia (SCA) is an abnormality in the oxygen-carrying
More informationIN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE.
IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE By Siddharth Pratap Thesis Submitted to the Faculty of the Graduate School
More informationPackage HAP.ROR. R topics documented: February 19, 2015
Type Package Title Recursive Organizer (ROR) Version 1.0 Date 2013-03-23 Author Lue Ping Zhao and Xin Huang Package HAP.ROR February 19, 2015 Maintainer Xin Huang Depends R (>=
More informationAsingle inherited mutant gene may be enough to
396 Cancer Inheritance STEVEN A. FRANK Asingle inherited mutant gene may be enough to cause a very high cancer risk. Single-mutation cases have provided much insight into the genetic basis of carcinogenesis,
More informationComparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes
Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Ivan Arreola and Dr. David Han Department of Management of Science and Statistics, University
More informationARTICLE RESEARCH. Macmillan Publishers Limited. All rights reserved
Extended Data Figure 6 Annotation of drivers based on clinical characteristics and co-occurrence patterns. a, Putative drivers affecting greater than 10 patients were assessed for enrichment in IGHV mutated
More informationIn-house* validation of Qualitative Methods
Example from Gilbert de Roy In-house* validation of Qualitative Methods Aspects from a non forensic but analytical chemist *In-house in your own laboratory Presented at ENFSI, European Paint group meeting
More information(ii) The effective population size may be lower than expected due to variability between individuals in infectiousness.
Supplementary methods Details of timepoints Caió sequences were derived from: HIV-2 gag (n = 86) 16 sequences from 1996, 10 from 2003, 45 from 2006, 13 from 2007 and two from 2008. HIV-2 env (n = 70) 21
More informationBayes Factors for t tests and one way Analysis of Variance; in R
Bayes Factors for t tests and one way Analysis of Variance; in R Dr. Jon Starkweather It may seem like small potatoes, but the Bayesian approach offers advantages even when the analysis to be run is not
More informationWhat can genetic studies tell us about ADHD? Dr Joanna Martin, Cardiff University
What can genetic studies tell us about ADHD? Dr Joanna Martin, Cardiff University Outline of talk What do we know about causes of ADHD? Traditional family studies Modern molecular genetic studies How can
More informationLab 5: Testing Hypotheses about Patterns of Inheritance
Lab 5: Testing Hypotheses about Patterns of Inheritance How do we talk about genetic information? Each cell in living organisms contains DNA. DNA is made of nucleotide subunits arranged in very long strands.
More informationBST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis
BST227 Introduction to Statistical Genetics Lecture 4: Introduction to linkage and association analysis 1 Housekeeping Homework #1 due today Homework #2 posted (due Monday) Lab at 5:30PM today (FXB G13)
More informationCS2220 Introduction to Computational Biology
CS2220 Introduction to Computational Biology WEEK 8: GENOME-WIDE ASSOCIATION STUDIES (GWAS) 1 Dr. Mengling FENG Institute for Infocomm Research Massachusetts Institute of Technology mfeng@mit.edu PLANS
More informationReducing INDEL calling errors in whole genome and exome sequencing data.
Reducing INDEL calling errors in whole genome and exome sequencing data. Han Fang November 8, 2014 CSHL Biological Data Science Meeting Acknowledgments Lyon Lab Yiyang Wu Jason O Rawe Laura J Barron Max
More informationIntroduction to Computational Neuroscience
Introduction to Computational Neuroscience Lecture 11: Attention & Decision making Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis
More informationNew Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences
More information