Structured Association Advanced Topics in Computa8onal Genomics

Size: px
Start display at page:

Download "Structured Association Advanced Topics in Computa8onal Genomics"

Transcription

1 Structured Association Advanced Topics in Computa8onal Genomics

2 Structured Association Lasso ACGTTTTACTGTACAATT Gflasso (Kim & Xing, 2009) ACGTTTTACTGTACAATT Greater power Fewer false posi2ves Phenome associa2ons

3 Structured Association Lasso ACGTTTTACTGTACAATT Network- constrained regulariza8on (Li & Li, 2008) ACGTTTTACTGTACAATT

4 Regression with Regularization Fused lasso (Tibshirani et al., 2004)

5 Standard regression Regression with Regularization (Fused Lasso) lasso Fusion penalty only Fused lasso Black line: true values Red line: es8mated values

6 Lasso for Reducing False Positives (Tibshirani, 1996) Trait Genotype Associa8on Strength 2.1 = T G A A C C A T G A A G T A x Lasso Penalty for sparsity argmin (y Xβ) (y Xβ) β + β j Many zero associa8ons (sparse results), but what if there are mul8ple related traits?

7 Multivariate Regression for Multiple-Trait Association Analysis Trait Genotype Associa8on Strength (3.4, 1.5, 2.1, 0.9, 1.8) Allergy Lung physiology = T G A A C C A T G A A G T A x Associa8on strength between SNP j and Trait k: β jk argmin (y Xβ) (y Xβ) β + β j + We introduce graph- guided fusion penalty

8 Multiple-trait Association: Graph-Constrained Fused Lasso Step 1: Thresholded correla8on graph of phenotypes Step 2: Graph- constrained fused lasso ACGTTTTACTGTACAATT Fusion Lasso Penalty Graph- constrained fusion penalty

9 Fusion Penalty SNP j ACGTTTTACTGTACAATT Associa8on strength between SNP j and Trait k: β jk Associa8on strength between SNP j and Trait m: β jm Trait m Trait k Fusion Penalty: β jk - β jm For two correlated traits (connected in the network), the association strengths may have similar values.

10 Graph-Constrained Fused Lasso Overall effect ACGTTTTACTGTACAATT Fusion effect propagates to the entire network Association between SNPs and subnetworks of traits

11 Multiple-trait Association: Graph-Weighted Fused Lasso Overall effect ACGTTTTACTGTACAATT Subnetwork structure is embedded as a densely connected nodes with large edge weights Edges with small weights are effectively ignored

12 Estimating Parameters Quadratic programming formulation Graph-constrained fused lasso Graph-weighted fused lasso Many publicly available software packages for solving convex optimization problems can be used

13 Improving Scalability Original problem Equivalently Using a varia8onal formula8on Itera8ve op8miza8on Update β k Update d jk s, d jml s

14 Simula2on Results 50 SNPs taken from HapMap chromosome 7, CEU population 10 traits SNPs Trait Correla8on Matrix Phenotypes Thresholded Trait Correla8on Network High associa8on No associa8on True Regression Coefficients Single SNP- Single Trait Test Significant at α = 0.01 Lasso Graph- guided Fused Lasso

15 Asthma Trait Network Subnetwork for Asthma symptoms Phenotype Correla8on Network Subnetwork for lung physiology Subnetwork for quality of life

16 Results from Single-SNP/Trait Test Phenotypes Phenotypes Trait Network Lung physiology- related traits I Baseline FEV1 predicted value: MPVLung Pre FEF predicted value Average nitric oxide value: online Body Mass Index Postbronchodila8on FEV1, liters: Spirometry Baseline FEV1 % predicted: Spirometry Baseline predrug FEV1, % predicted Baseline predrug FEV1, % predicted Q551R SNP Codes for amino- acid changes in the intracellular signaling por8on of the receptor Exon 11 SNPs High associa8on No associa8on Single- Marker Single- Trait Test Permuta8on test α = 0.05 Permuta8on test α = 0.01

17 Comparison of Gflasso with Others Phenotypes Phenotypes Trait Network Lung physiology- related traits I Baseline FEV1 predicted value: MPVLung Pre FEF predicted value Average nitric oxide value: online Body Mass Index Postbronchodila8on FEV1, liters: Spirometry Baseline FEV1 % predicted: Spirometry Baseline predrug FEV1, % predicted Baseline predrug FEV1, % predicted Q551R SNP Codes for amino- acid changes in the intracellular signaling por8on of the receptor Exon 11 SNPs? High associa8on No associa8on Single- Marker Single- Trait Test Lasso Graph- guided Fused Lasso

18 Simulation Results

19 Linkage Disequilibrium Structure in IL-4R gene SNP rs SNP rs SNP Q551R r 2 =0.64 r 2 =0.07

20 Bias and Variance Tradeoff The penalty func8on introduces bias to the es8ma8on process, but can reduce the variance The amount of the bias is controlled by selec8ng the appropriate regulariza8on parameter

21 Network-Constrained Regularization for Leveraging Pathway Information (Li and Li, 2008) Pathway databases as prior biological knowledge KEGG, Reactome, BioCarta, BioCyc Leverage the pathway informa8on to detect genes in pathway relevant to the given outcome

22 Graph Laplacian Graph Laplacian: L = D- W Weighted adjacency matrix W: w ij =w ji, w ij =0 if no edges between nodes i and j Degree matrix D: diagonal matrix with diagonal entries Normalized graph Laplacian: Symmetric and posi8ve definite

23 Network-Constrained Regularized Regression Network- constrained regulariza8on criterion Equivalently, If L=I, it becomes elas8c net

24 Optimization Cast it as a lasso op8miza8on problem where

25 Simulation Studies Model: 200 transcrip8on factors, each regula8ng 10 genes four transcrip8on factors and their target genes are relevant to the given response

26 Results from Simulation Study Comparison of lasso, elas8c net, and network- constrained regularized regression

27 Analysis of Glioblastoma Dataset Response: Cancer survival/death Predictors: 1533 genes on 33 KEGG pathways

28 Gene Graph Components Relevant to Cancer Survival

Copy Number Variations and Association Mapping Advanced Topics in Computa8onal Genomics

Copy Number Variations and Association Mapping Advanced Topics in Computa8onal Genomics Copy Number Variations and Association Mapping 02-715 Advanced Topics in Computa8onal Genomics SNP and CNV Genotyping SNP genotyping assumes two copy numbers at each locus (i.e., no CNVs) CNV genotyping

More information

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/ CSCI1950 Z Computa4onal Methods for Biology Lecture 18 Ben Raphael April 8, 2009 hip://cs.brown.edu/courses/csci1950 z/ Binary classifica,on Given a set of examples (x i, y i ), where y i = + 1, from unknown

More information

Ridge regression for risk prediction

Ridge regression for risk prediction Ridge regression for risk prediction with applications to genetic data Erika Cule and Maria De Iorio Imperial College London Department of Epidemiology and Biostatistics School of Public Health May 2012

More information

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010 Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010 C.J.Vaske et al. May 22, 2013 Presented by: Rami Eitan Complex Genomic

More information

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Patrick J. Heagerty Department of Biostatistics University of Washington 174 Biomarkers Session Outline

More information

Missing Heritablility How to Analyze Your Own Genome Fall 2013

Missing Heritablility How to Analyze Your Own Genome Fall 2013 Missing Heritablility 02-223 How to Analyze Your Own Genome Fall 2013 Heritability Heritability: the propor>on of observed varia>on in a par>cular trait (as height) that can be agributed to inherited gene>c

More information

What is Regularization? Example by Sean Owen

What is Regularization? Example by Sean Owen What is Regularization? Example by Sean Owen What is Regularization? Name3 Species Size Threat Bo snake small friendly Miley dog small friendly Fifi cat small enemy Muffy cat small friendly Rufus dog large

More information

Supplementary Data. Correlation analysis. Importance of normalizing indices before applying SPCA

Supplementary Data. Correlation analysis. Importance of normalizing indices before applying SPCA Supplementary Data Correlation analysis The correlation matrix R of the m = 25 GV indices calculated for each dataset is reported below (Tables S1 S3). R is an m m symmetric matrix, whose entries r ij

More information

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used

More information

Using Network Flow to Bridge the Gap between Genotype and Phenotype. Teresa Przytycka NIH / NLM / NCBI

Using Network Flow to Bridge the Gap between Genotype and Phenotype. Teresa Przytycka NIH / NLM / NCBI Using Network Flow to Bridge the Gap between Genotype and Phenotype Teresa Przytycka NIH / NLM / NCBI Journal Wisla (1902) Picture from a local fare in Lublin, Poland Genotypes Phenotypes Journal Wisla

More information

VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA

VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA by Melissa L. Ziegler B.S. Mathematics, Elizabethtown College, 2000 M.A. Statistics, University of Pittsburgh, 2002 Submitted to the Graduate Faculty

More information

AN INTEGRATIVE COMPUTATIONAL FRAMEWORK FOR DEFINING ASTHMA ENDOTYPES. by J. A. Howrylak, MD

AN INTEGRATIVE COMPUTATIONAL FRAMEWORK FOR DEFINING ASTHMA ENDOTYPES. by J. A. Howrylak, MD AN INTEGRATIVE COMPUTATIONAL FRAMEWORK FOR DEFINING ASTHMA ENDOTYPES by J. A. Howrylak, MD Submitted to the Graduate Faculty of the University of Pittsburgh School of Medicine, Department of Computational

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen

More information

Characteriza*on of Soma*c Muta*ons in Cancer Genomes

Characteriza*on of Soma*c Muta*ons in Cancer Genomes Characteriza*on of Soma*c Muta*ons in Cancer Genomes Ben Raphael Department of Computer Science Center for Computa*onal Molecular Biology Soma*c Muta*ons and Cancer Clonal Theory (Nowell 1976) Passenger

More information

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction

More information

INTEGRATION OF MULTI-PLATFORM HIGH-DIMENSIONAL OMIC DATA

INTEGRATION OF MULTI-PLATFORM HIGH-DIMENSIONAL OMIC DATA Texas Medical Center Library DigitalCommons@TMC UT GSBS Dissertations and Theses (Open Access) Graduate School of Biomedical Sciences 5-2016 INTEGRATION OF MULTI-PLATFORM HIGH-DIMENSIONAL OMIC DATA Xuebei

More information

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012 Statistical Tests for X Chromosome Association Study with Simulations Jian Wang July 10, 2012 Statistical Tests Zheng G, et al. 2007. Testing association for markers on the X chromosome. Genetic Epidemiology

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Ronghui (Lily) Xu Division of Biostatistics and Bioinformatics Department of Family Medicine

More information

Network Estimation and Sparsity

Network Estimation and Sparsity Chapter Network Estimation and Sparsity Abstract Network models, in which psychopathological disorders are conceptualized as a complex interplay of psychological and biological components, have become

More information

CS2220 Introduction to Computational Biology

CS2220 Introduction to Computational Biology CS2220 Introduction to Computational Biology WEEK 8: GENOME-WIDE ASSOCIATION STUDIES (GWAS) 1 Dr. Mengling FENG Institute for Infocomm Research Massachusetts Institute of Technology mfeng@mit.edu PLANS

More information

In this module we will cover Correla4on and Validity.

In this module we will cover Correla4on and Validity. In this module we will cover Correla4on and Validity. A correla4on coefficient is a sta4s4c that is o:en used as an es4mate of measurement, such as validity and reliability. You will learn the strength

More information

Comparison of segmentation methods in cancer samples

Comparison of segmentation methods in cancer samples fig/logolille2. Comparison of segmentation methods in cancer samples Morgane Pierre-Jean, Guillem Rigaill, Pierre Neuvial Laboratoire Statistique et Génome Université d Évry Val d Éssonne UMR CNRS 8071

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and

More information

Pa#ern recogni,on and neuroimaging in psychiatry

Pa#ern recogni,on and neuroimaging in psychiatry Pa#ern recogni,on and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max Planck UCL Centre for Computa=onal Psychiatry and Ageing Research Outline Supervised learning

More information

Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend

Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend Table of content -Supplementary methods -Figure S1 -Figure S2 -Figure S3 -Table legend Supplementary methods Yeast two-hybrid bait basal transactivation test Because bait constructs sometimes self-transactivate

More information

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

SNPrints: Defining SNP signatures for prediction of onset in complex diseases SNPrints: Defining SNP signatures for prediction of onset in complex diseases Linda Liu, Biomedical Informatics, Stanford University Daniel Newburger, Biomedical Informatics, Stanford University Grace

More information

Integration of high-throughput biological data

Integration of high-throughput biological data Integration of high-throughput biological data Jean Yang and Vivek Jayaswal School of Mathematics and Statistics University of Sydney Meeting the Challenges of High Dimension: Statistical Methodology,

More information

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Manuela Zucknick Division of Biostatistics, German Cancer Research Center Biometry Workshop,

More information

REGULARIZED MULTIVARIATE REGRESSION FOR IDENTIFYING MASTER PREDICTORS WITH APPLICATION TO INTEGRATIVE GENOMICS STUDY OF BREAST CANCER

REGULARIZED MULTIVARIATE REGRESSION FOR IDENTIFYING MASTER PREDICTORS WITH APPLICATION TO INTEGRATIVE GENOMICS STUDY OF BREAST CANCER The Annals of Applied Statistics 2010, Vol. 4, No. 1, 53 77 DOI: 10.1214/09-AOAS271 Institute of Mathematical Statistics, 2010 REGULARIZED MULTIVARIATE REGRESSION FOR IDENTIFYING MASTER PREDICTORS WITH

More information

CSE 255 Assignment 9

CSE 255 Assignment 9 CSE 255 Assignment 9 Alexander Asplund, William Fedus September 25, 2015 1 Introduction In this paper we train a logistic regression function for two forms of link prediction among a set of 244 suspected

More information

Inferring relationships between health and fertility in Norwegian Red cows using recursive models

Inferring relationships between health and fertility in Norwegian Red cows using recursive models Corresponding author: Bjørg Heringstad, e-mail: bjorg.heringstad@umb.no Inferring relationships between health and fertility in Norwegian Red cows using recursive models Bjørg Heringstad, 1,2 Xiao-Lin

More information

Mul$ Voxel Pa,ern Analysis (fmri) Mul$ Variate Pa,ern Analysis (more generally) Magic Voxel Pa,ern Analysis (probably not!)

Mul$ Voxel Pa,ern Analysis (fmri) Mul$ Variate Pa,ern Analysis (more generally) Magic Voxel Pa,ern Analysis (probably not!) Mul$ Voxel Pa,ern Analysis (fmri) Mul$ Variate Pa,ern Analysis (more generally) Magic Voxel Pa,ern Analysis (probably not!) all MVPA really shows is that there are places where, in most people s brain,

More information

Graphical Modeling Approaches for Estimating Brain Networks

Graphical Modeling Approaches for Estimating Brain Networks Graphical Modeling Approaches for Estimating Brain Networks BIOS 516 Suprateek Kundu Department of Biostatistics Emory University. September 28, 2017 Introduction My research focuses on understanding how

More information

Agent-Based Models. Maksudul Alam, Wei Wang

Agent-Based Models. Maksudul Alam, Wei Wang Agent-Based Models Maksudul Alam, Wei Wang Outline Literature Review about Agent-Based model Modeling disease outbreaks in realistic urban social Networks EpiSimdemics: an Efficient Algorithm for Simulating

More information

Case Studies of Signed Networks

Case Studies of Signed Networks Case Studies of Signed Networks Christopher Wang December 10, 2014 Abstract Many studies on signed social networks focus on predicting the different relationships between users. However this prediction

More information

Use and Interpreta,on of LD Score Regression. Brendan Bulik- Sullivan PGC Stat Analysis Call

Use and Interpreta,on of LD Score Regression. Brendan Bulik- Sullivan PGC Stat Analysis Call Use and Interpreta,on of LD Score Regression Brendan Bulik- Sullivan bulik@broadins,tute.org PGC Stat Analysis Call Outline of Talk Intui,on, Theory, Results LD Score regression intercept: dis,nguishing

More information

Methods for meta-analysis of individual participant data from Mendelian randomization studies with binary outcomes

Methods for meta-analysis of individual participant data from Mendelian randomization studies with binary outcomes Methods for meta-analysis of individual participant data from Mendelian randomization studies with binary outcomes Stephen Burgess Simon G. Thompson CRP CHD Genetics Collaboration May 24, 2012 Abstract

More information

Haplotype allelic classes in the lactase persistence locus

Haplotype allelic classes in the lactase persistence locus Haplotype allelic classes in the lactase persistence locus Robert Cedergren Colloquium november 3 rd 28 Julie Hussin 1,2, Philippe Nadeau 1,2, Jean-François Lefebvre 2 and Damian Labuda 1-3 1 Bioinformatics

More information

The Late Pretest Problem in Randomized Control Trials of Education Interventions

The Late Pretest Problem in Randomized Control Trials of Education Interventions The Late Pretest Problem in Randomized Control Trials of Education Interventions Peter Z. Schochet ACF Methods Conference, September 2012 In Journal of Educational and Behavioral Statistics, August 2010,

More information

Dan Koller, Ph.D. Medical and Molecular Genetics

Dan Koller, Ph.D. Medical and Molecular Genetics Design of Genetic Studies Dan Koller, Ph.D. Research Assistant Professor Medical and Molecular Genetics Genetics and Medicine Over the past decade, advances from genetics have permeated medicine Identification

More information

Human population sub-structure and genetic association studies

Human population sub-structure and genetic association studies Human population sub-structure and genetic association studies Stephanie A. Santorico, Ph.D. Department of Mathematical & Statistical Sciences Stephanie.Santorico@ucdenver.edu Global Similarity Map from

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Bayes Linear Statistics. Theory and Methods

Bayes Linear Statistics. Theory and Methods Bayes Linear Statistics Theory and Methods Michael Goldstein and David Wooff Durham University, UK BICENTENNI AL BICENTENNIAL Contents r Preface xvii 1 The Bayes linear approach 1 1.1 Combining beliefs

More information

Pathway analysis of bladder cancer genome-wide association study identifies novel pathways involved in bladder cancer development Chen et al

Pathway analysis of bladder cancer genome-wide association study identifies novel pathways involved in bladder cancer development Chen et al Pathway analysis of bladder cancer genome-wide association study identifies novel pathways involved in bladder cancer development Chen et al Supplementary Table 1: 85 significant pathways from Gen-Gen

More information

Genomewide Linkage of Forced Mid-Expiratory Flow in Chronic Obstructive Pulmonary Disease

Genomewide Linkage of Forced Mid-Expiratory Flow in Chronic Obstructive Pulmonary Disease ONLINE DATA SUPPLEMENT Genomewide Linkage of Forced Mid-Expiratory Flow in Chronic Obstructive Pulmonary Disease Dawn L. DeMeo, M.D., M.P.H.,Juan C. Celedón, M.D., Dr.P.H., Christoph Lange, John J. Reilly,

More information

Predictive Subnetwork Extraction with Structural Priors for Infant Connectomes

Predictive Subnetwork Extraction with Structural Priors for Infant Connectomes Predictive Subnetwork Extraction with Structural Priors for Infant Connectomes Colin J. Brown 1, Steven P. Miller 2, Brian G. Booth 1, Jill G. Zwicker 3, Ruth E. Grunau 3, Anne R. Synnes 3, Vann Chau 2,

More information

Taking a closer look at trio designs and unscreened controls in the GWAS era

Taking a closer look at trio designs and unscreened controls in the GWAS era Taking a closer look at trio designs and unscreened controls in the GWAS era PGC Sta8s8cal Analysis Call, November 4th 015 Wouter Peyrot, MD, Psychiatrist in training, PhD candidate Professors Brenda Penninx,

More information

Refining multivariate disease phenotypes for high chip heritability

Refining multivariate disease phenotypes for high chip heritability Sun et al. RESEARCH Refining multivariate disease phenotypes for high chip heritability Jiangwen Sun 1, Henry R. Kranzler 2 and Jinbo Bi 1* * Correspondence: jinbo@engr.uconn.edu 1 Department of Computer

More information

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22. Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.32 PCOS locus after conditioning for the lead SNP rs10993397;

More information

Sta$s$cs is Easy. Dennis Shasha From a book co- wri7en with Manda Wilson

Sta$s$cs is Easy. Dennis Shasha From a book co- wri7en with Manda Wilson Sta$s$cs is Easy Dennis Shasha From a book co- wri7en with Manda Wilson Is the Coin Fair? You toss a coin 17 $mes and it comes up heads 15 out of 17 $mes. How likely is it that coin is fair? Could look

More information

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666 Quantitative Trait Analsis in Sibling Pairs Biostatistics 666 Outline Likelihood function for bivariate data Incorporate genetic kinship coefficients Incorporate IBD probabilities The data Pairs of measurements

More information

Tutorial on Genome-Wide Association Studies

Tutorial on Genome-Wide Association Studies Tutorial on Genome-Wide Association Studies Assistant Professor Institute for Computational Biology Department of Epidemiology and Biostatistics Case Western Reserve University Acknowledgements Dana Crawford

More information

LOW-RANK DECOMPOSITION AND LOGISTIC REGRESSION METHODS FOR LINK PREDICTION IN TERRORIST NETWORKS CSE 293 MS PROJECT REPORT, FALL 2010.

LOW-RANK DECOMPOSITION AND LOGISTIC REGRESSION METHODS FOR LINK PREDICTION IN TERRORIST NETWORKS CSE 293 MS PROJECT REPORT, FALL 2010. LOW-RANK DECOMPOSITION AND LOGISTIC REGRESSION METHODS FOR LINK PREDICTION IN TERRORIST NETWORKS CSE 293 MS PROJECT REPORT, FALL 2010 Eric Doi ekdoi@cs.ucsd.edu University of California, San Diego ABSTRACT

More information

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi

More information

Network-based regularization for high dimensional SNP data in the case control study of Type 2 diabetes

Network-based regularization for high dimensional SNP data in the case control study of Type 2 diabetes Ren et al. BMC Genetics (2017) 18:44 DOI 10.1186/s12863-017-0495-5 METHODOLOGY TICLE Network-based regularization for high dimensional SNP data in the case control study of Type 2 diabetes Jie Ren 1, Tao

More information

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring Volume 31 (1), pp. 17 37 http://orion.journals.ac.za ORiON ISSN 0529-191-X 2015 The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression

More information

Gene-microRNA network module analysis for ovarian cancer

Gene-microRNA network module analysis for ovarian cancer Gene-microRNA network module analysis for ovarian cancer Shuqin Zhang School of Mathematical Sciences Fudan University Oct. 4, 2016 Outline Introduction Materials and Methods Results Conclusions Introduction

More information

M AXIMUM INGREDIENT LEVEL OPTIMIZATION WORKBOOK

M AXIMUM INGREDIENT LEVEL OPTIMIZATION WORKBOOK M AXIMUM INGREDIENT LEVEL OPTIMIZATION WORKBOOK for Estimating the Maximum Safe Levels of Feedstuffs Rashed A. Alhotan, Department of Animal Production, King Saud University 1 Dmitry Vedenov, Department

More information

A Ra%onal Perspec%ve on Heuris%cs and Biases. Falk Lieder, Tom Griffiths, & Noah Goodman Computa%onal Cogni%ve Science Lab UC Berkeley

A Ra%onal Perspec%ve on Heuris%cs and Biases. Falk Lieder, Tom Griffiths, & Noah Goodman Computa%onal Cogni%ve Science Lab UC Berkeley A Ra%onal Perspec%ve on Heuris%cs and Biases Falk Lieder, Tom Griffiths, & Noah Goodman Computa%onal Cogni%ve Science Lab UC Berkeley Outline 1. What is a good heuris%c? How good are the heuris%cs that

More information

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do. Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and

More information

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018 HANDLING MULTICOLLINEARITY; A COMPARATIVE STUDY OF THE PREDICTION PERFORMANCE OF SOME METHODS BASED ON SOME PROBABILITY DISTRIBUTIONS Zakari Y., Yau S. A., Usman U. Department of Mathematics, Usmanu Danfodiyo

More information

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018 An Introduction to Quantitative Genetics I Heather A Lawson Advanced Genetics Spring2018 Outline What is Quantitative Genetics? Genotypic Values and Genetic Effects Heritability Linkage Disequilibrium

More information

arxiv: v4 [stat.me] 7 May 2010

arxiv: v4 [stat.me] 7 May 2010 Submitted to the Annals of Applied Statistics arxiv: 0906.2234 RECONSTRUCTING DNA COPY NUMBER BY PENALIZED ESTIMATION AND IMPUTATION arxiv:0906.2234v4 [stat.me] 7 May 2010 By Zhongyang Zhang, Kenneth Lange,

More information

Identifying Susceptibility in Epidemiology Studies: Implications for Risk Assessment. Joel Schwartz Harvard TH Chan School of Public Health

Identifying Susceptibility in Epidemiology Studies: Implications for Risk Assessment. Joel Schwartz Harvard TH Chan School of Public Health Identifying Susceptibility in Epidemiology Studies: Implications for Risk Assessment Joel Schwartz Harvard TH Chan School of Public Health Risk Assessment and Susceptibility Typically we do risk assessments

More information

Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University

Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University High dimensional multivariate data, where the number of variables

More information

Assessing Functional Neural Connectivity as an Indicator of Cognitive Performance *

Assessing Functional Neural Connectivity as an Indicator of Cognitive Performance * Assessing Functional Neural Connectivity as an Indicator of Cognitive Performance * Brian S. Helfer 1, James R. Williamson 1, Benjamin A. Miller 1, Joseph Perricone 1, Thomas F. Quatieri 1 MIT Lincoln

More information

Structure-Leveraged Methods in Breast Cancer Risk Prediction

Structure-Leveraged Methods in Breast Cancer Risk Prediction Journal of Machine Learning Research 17 (2016) 1-15 Submitted 8/15; Revised 3/16; Published 12/16 Structure-Leveraged Methods in Breast Cancer Risk Prediction Jun Fan junfan@stat.wisc.edu Department of

More information

CNV PCA Search Tutorial

CNV PCA Search Tutorial CNV PCA Search Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Data Preparation 2 A. Join Log Ratio Data with Phenotype Information.............................. 2 B. Activate only

More information

Multivariable Systems. Lawrence Hubert. July 31, 2011

Multivariable Systems. Lawrence Hubert. July 31, 2011 Multivariable July 31, 2011 Whenever results are presented within a multivariate context, it is important to remember that there is a system present among the variables, and this has a number of implications

More information

Integrated analysis of mirna/mrna expression and gene methylation using sparse canonical correlation analysis.

Integrated analysis of mirna/mrna expression and gene methylation using sparse canonical correlation analysis. University of Louisville ThinkIR: The University of Louisville's Institutional Repository Electronic Theses and Dissertations 5-2016 Integrated analysis of mirna/mrna expression and gene methylation using

More information

Influence of overweight and obesity on the diabetes in the world on adult people using spatial regression

Influence of overweight and obesity on the diabetes in the world on adult people using spatial regression International Journal of Advances in Intelligent Informatics ISSN: 2442-6571 149 Influence of overweight and obesity on the diabetes in the world on adult people using spatial regression Tuti Purwaningsih

More information

Inference of Isoforms from Short Sequence Reads

Inference of Isoforms from Short Sequence Reads Inference of Isoforms from Short Sequence Reads Tao Jiang Department of Computer Science and Engineering University of California, Riverside Tsinghua University Joint work with Jianxing Feng and Wei Li

More information

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX Paper 1766-2014 Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX ABSTRACT Chunhua Cao, Yan Wang, Yi-Hsin Chen, Isaac Y. Li University

More information

Evalua&ng Methods. Tandy Warnow

Evalua&ng Methods. Tandy Warnow Evalua&ng Methods Tandy Warnow You ve designed a new method! Now what? To evaluate a new method: Establish theore&cal proper&es. Evaluate on data. Compare the new method to other methods. How do you do

More information

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics Mike West Duke University Papers, software, many links: www.isds.duke.edu/~mw ABS04 web site: Lecture slides, stats notes, papers,

More information

Decomposition of the Genotypic Value

Decomposition of the Genotypic Value Decomposition of the Genotypic Value 1 / 17 Partitioning of Phenotypic Values We introduced the general model of Y = G + E in the first lecture, where Y is the phenotypic value, G is the genotypic value,

More information

Linear and Nonlinear Optimization

Linear and Nonlinear Optimization Linear and Nonlinear Optimization SECOND EDITION Igor Griva Stephen G. Nash Ariela Sofer George Mason University Fairfax, Virginia Society for Industrial and Applied Mathematics Philadelphia Contents Preface

More information

Quantitative genetics: traits controlled by alleles at many loci

Quantitative genetics: traits controlled by alleles at many loci Quantitative genetics: traits controlled by alleles at many loci Human phenotypic adaptations and diseases commonly involve the effects of many genes, each will small effect Quantitative genetics allows

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Why and how to make R packages. Bob Muscarella Aarhus University May 7, 2015

Why and how to make R packages. Bob Muscarella Aarhus University May 7, 2015 Why and how to make R packages Bob Muscarella Aarhus University May 7, 2015 What and how are R packages? Loading the package makes the components of the package available Packages store and organize

More information

GENOME-WIDE ASSOCIATION STUDIES

GENOME-WIDE ASSOCIATION STUDIES GENOME-WIDE ASSOCIATION STUDIES SUCCESSES AND PITFALLS IBT 2012 Human Genetics & Molecular Medicine Zané Lombard IDENTIFYING DISEASE GENES??? Nature, 15 Feb 2001 Science, 16 Feb 2001 IDENTIFYING DISEASE

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Uncovering interactions with Random Forests. Jake Michaelson Marit Ackermann Andreas Beyer

Uncovering interactions with Random Forests. Jake Michaelson Marit Ackermann Andreas Beyer Uncovering interactions with Random Forests Jake Michaelson Marit Ackermann Andreas eyer Random Forests >> ensembles of decision trees >> diverse trees trying to solve the same problem >> used frequently

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Illustrative example of ptdt using height The expected value of a child s polygenic risk score (PRS) for a trait is the average of maternal and paternal PRS values. For example,

More information

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S. Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S. December 17, 2014 1 Introduction Asthma is a chronic respiratory disease affecting

More information

Introduction of Genome wide Complex Trait Analysis (GCTA) Presenter: Yue Ming Chen Location: Stat Gen Workshop Date: 6/7/2013

Introduction of Genome wide Complex Trait Analysis (GCTA) Presenter: Yue Ming Chen Location: Stat Gen Workshop Date: 6/7/2013 Introduction of Genome wide Complex Trait Analysis (GCTA) resenter: ue Ming Chen Location: Stat Gen Workshop Date: 6/7/013 Outline Brief review of quantitative genetics Overview of GCTA Ideas Main functions

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Statistical Genetics : Gene Mappin g through Linkag e and Associatio n

Statistical Genetics : Gene Mappin g through Linkag e and Associatio n Statistical Genetics : Gene Mappin g through Linkag e and Associatio n Benjamin M Neale Manuel AR Ferreira Sarah E Medlan d Danielle Posthuma About the editors List of contributors Preface Acknowledgements

More information

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests

More information

Haplotypes of VKORC1, NQO1 and GGCX, their effect on activity levels of vitamin K-dependent coagulation factors, and the risk of venous thrombosis

Haplotypes of VKORC1, NQO1 and GGCX, their effect on activity levels of vitamin K-dependent coagulation factors, and the risk of venous thrombosis Haplotypes of VKORC1, NQO1 and GGCX, their effect on activity levels of vitamin K-dependent coagulation factors, and the risk of venous thrombosis Haplotypes of VKORC1, NQO1 and GGCX, their effect on activity

More information

Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors

Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE 0, SEPTEMBER 01 ISSN 81 Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors Nabila Al Balushi

More information

University of Groningen. Metabolic risk in people with psychotic disorders Bruins, Jojanneke

University of Groningen. Metabolic risk in people with psychotic disorders Bruins, Jojanneke University of Groningen Metabolic risk in people with psychotic disorders Bruins, Jojanneke IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

More information

Regularized Multivariate Regression for Identifying. Master Predictors with Application to Integrative. Genomics Study of Breast Cancer

Regularized Multivariate Regression for Identifying. Master Predictors with Application to Integrative. Genomics Study of Breast Cancer Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer Jie Peng 1, Ji Zhu 2, Anna Bergamaschi 3, Wonshik Han 4, Dong-Young

More information

Imaging Genetics: Heritability, Linkage & Association

Imaging Genetics: Heritability, Linkage & Association Imaging Genetics: Heritability, Linkage & Association David C. Glahn, PhD Olin Neuropsychiatry Research Center & Department of Psychiatry, Yale University July 17, 2011 Memory Activation & APOE ε4 Risk

More information

Performing. linkage analysis using MERLIN

Performing. linkage analysis using MERLIN Performing linkage analysis using MERLIN David Duffy Queensland Institute of Medical Research Brisbane, Australia Overview MERLIN and associated programs Error checking Parametric linkage analysis Nonparametric

More information

A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data

A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data Method A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data Xiao-Gang Ruan, Jin-Lian Wang*, and Jian-Geng Li Institute of Artificial Intelligence and

More information

De novo iden)fica)on of SNPs from RNA- seq data in non- model species

De novo iden)fica)on of SNPs from RNA- seq data in non- model species De novo iden)fica)on of SNPs from RNA- seq data in non- model species Hélène Lopez- Maestre 8th Novembre 2016 Why work with RNAseq? Lower cost SNPs from expressed regions SNPs with a more direct func:onal

More information

Multiscale factor models for molecular networks

Multiscale factor models for molecular networks Multiscale factor models for molecular networks Justin Guinney 1,2, Philip Febbo 1,3,4, Mauro Maggioni 5,6, and Sayan Mukherjee 5,6,7 Institute for Genome Sciences & Policy 1, Department of Medicine 2,

More information