Biostatistical modelling in genomics for clinical cancer studies

Size: px
Start display at page:

Download "Biostatistical modelling in genomics for clinical cancer studies"

Transcription

1 This work was supported by Entente Cordiale Cancer Research Bursaries Biostatistical modelling in genomics for clinical cancer studies Philippe Broët JE 2492 Faculté de Médecine Paris-Sud In collaboration with S. Richardson Imperial College London Statistical Latent Variables Models in the Health Sciences Perugia, Italy, 6-8 September 2006

2 Outline Genomic biotechnologies in clinical research Introduction Comparative Genomic Hybridization (CGH) Spatial mixture model-based approach Model Performance Clinical example: Lung carcinoma study Conclusion

3 Introduction

4 Genomic-oriented biotechnologies thousands of information from one sample

5 Genomic-oriented biotechnologies thousands of information from one sample CGH µarray DNA

6 Genomic-oriented biotechnologies thousands of information from one sample CGH µarray DNA cdna/oligo µarray mrna

7 Genomic-oriented biotechnologies thousands of information from one sample CGH µarray DNA cdna/oligo µarrayµ mrna Protein µarray Protein

8 Context: Cancer Research Analysis of copy number changes of genomic sequences Loss Gain Tumor supressor gene Oncogene Amplification of an oncogene or Deletion of a tumor suppressor gene important mechanisms for tumorigenesis

9 Tumor suppressor gene = Break - Oncogene = Accelerator + Normal cell

10 Tumor suppressor genes - (e.g. p15)

11 Tumor suppressor genes - Oncogenes +++ (e.g. p15) (e.g. EGFR)

12 Tumor suppressor genes - Oncogenes +++ (e.g. p15) (e.g. EGFR) Cancer cell

13 Tumor Normal 1. Extraction (DNA) 2. Labelling (fluo) 3. (Co)-hybridization (oligo/bac) 4. Scanning

14 Tumor Normal 1. Extraction (DNA) 2. Labelling (fluo) 3. (Co)-hybridization (oligo/bac) 4. Scanning Quantitative measures of DNA content Statistical challenges Provide genomic status information (deletion/gain/modal) Estimate the error rate of the allocation

15 Spatial mixture model

16 Mixture model approach Z ik (observed log-ratio) measurement for the i th GS ordered along the chromosome k 3 latent states (mixture model): loss / modal / gain copy state loss gain modal Z ik

17 Mixture model approach Z ik (observed log-ratio) measurement for the i th GS ordered along the chromosome k 3 latent states (mixture model): loss / modal / gain copy state loss f(./θ L ) modal f(./θ M ) gain f(./θ G ) Z ik

18 Mixture model approach Z ik (observed log-ratio) measurement for the i th GS ordered along the chromosome k 3 latent states (mixture model): loss / modal / gain copy state loss f(./θ L ) modal f(./θ M ) gain f(./θ G ) Z i-k Z ik Z i+k

19 Mixture model framework c=-1,0,1 c w c i k : mixing proportion (or weight) for state c f c (./θ): conditional density for state c Latent structure for mixture model L ik an unobserved (latent) categorical variable taking the values (c=-1,0,1) with probability w c i k f c (./θ): N(./µ c, 2 c) Z ik /L ik =c ~ f c (./θ)

20 Spatial structure on the weights (w c i k w c ) Introduce 3 centred Markov random fields {x c i k } with nearest neighbours along the chromosomes Spatial neighbours of GS g x x x g -1 g g+1 Define weights (mixture proportions) to depend on the chromosomic location via a logistic model: with x c i k (latent) spatial variable => favours allocation of nearby GS to same component

21 Three latent Markov random fields x k c = {x c 1 k ;...; x c I k } having Intrinsic Gaussian conditional autoregression (ICAR) prior for δ ik = neighbour of i (m ik = #h), with constraint Σ ik x c 1 k = 0 Variance parameters τ c k2 of the ICAR act as a smoothing prior Switching structure between the states can be different between chromosomes Conditional density f c (./θ): N(./µ c, 2 c) Mean and variances (µ c,η c2 ) of the mixture components are common to all chromosomes borrowing information

22 Inference & quantities of interest Bayesian inference via MCMC In particular, latent state allocations, L ik of GS are sampled during the MCMC run Compute posterior probabilities p c ik= P(L ik = c data), c =-1,0,1 Estimated within the algorithm by counting the number of times where the GS is in state c divided by the length of the simulation run

23 Probabilistic allocation We allocate a sequence to a modified state (loss/gain copy state) if its posterior probability is above a threshold (otherwise allocate to modal state) =>Subset S of genomic sequences classified as modified (deleted/amplified) Error rate estimate FDR S = (1/M)Σ m S p* 0m M is the size of set S P* 0 = posterior probability of not being allocated to the modal state -> Can adjust the threshold to get a desired FDR and vice versa

24 Performance of the model

25 Simulation set-up 200 fake genomic sequences loss : Z ik /L c i =-1 k ~ N(./µ -1 =(-0.7;-0.4), 2 = ) modal : Z ik /L c i =0 k ~ N(./µ 0 =0, 2 = ) gain : Z ik /L c i =+1 k ~ N(./µ +1 = (+0.7;+0.4), 2 = ) 50 simulated sets

26 Simulation set-up 200 fake genomic sequences loss : Z ik /L c i =-1 k ~ N(./µ -1 =(-0.7;-0.4), 2 = ) modal : Z ik /L c i =0 k ~ N(./µ 0 =0, 2 = ) gain : Z ik /L c i =+1 k ~ N(./µ +1 = (+0.7;+0.4), 2 = ) 50 simulated sets Pattern of genomic alterations

27 Calculate the realized (estimated) FDR, TPF, TNF realized FDR = realized TPF = (Se) realized TNF = (Sp) # realized false discoveries (belonging to the modal copy state but claimed as modified) # discoveries # realized discoveries # true positives (truly modified sequences) # realized non discoveries # true negatives (truly unmodified sequences) Compare Spatial mixture-model Classical mixture model (without spatial structure) Spatial agglomerative clustering (CGH-Miner, Wang et al., 2005) same FDR target (FDR default of 1% for CGH-Miner)

28 Model performance Operational characteristics

29 Model performance Operational characteristics

30 Model performance FDR estimation threshold threshold threshold

31 Model performance FDR estimation threshold threshold threshold

32 Results Spatial mixture model-based approach Good operating characteristics: Better Se/Sp than the independent mixture model Powerful detection of genomic changes Reliable FDR estimate

33 Clinical example

34 Lung Cancer Study In collaboration with GIS/NCC-Singapore (L. Miller, P. Tan) Samples selection Primary lung adenocarcinoma (pt2n0) Quality of frozen tissue (presence of tumor cells) was checked by cytological apposition CGH technology 32K BAC arrays (2 Chips) Co-hybridization tumoral/normal sample

35

36

37 Individual threshold related to 5% FDR target

38 Cyclin D1 Immunohistochemistry Gene amplification and protein overexpression No gene amplification no protein expression

39 Conclusion

40 Spatial mixture model Interesting operational performance Selection of interesting GS based on an error criteria More powerful than independent mixture model Extension Incorporating genomic location information (distance function) Incorporate additional components Consider other selection rules

41

Bayesian hierarchical modelling

Bayesian hierarchical modelling Bayesian hierarchical modelling Matthew Schofield Department of Mathematics and Statistics, University of Otago Bayesian hierarchical modelling Slide 1 What is a statistical model? A statistical model:

More information

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Stanford Biostatistics Workshop Pierre Neuvial with Henrik Bengtsson and Terry Speed Department of Statistics, UC Berkeley

More information

False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University

False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University False Discovery Rates and Copy Number Variation Bradley Efron and Nancy Zhang Stanford University Three Statistical Centuries 19th (Quetelet) Huge data sets, simple questions 20th (Fisher, Neyman, Hotelling,...

More information

Harvard University. A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh)

Harvard University. A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh) Harvard University Harvard University Biostatistics Working Paper Series Year 2005 Paper 30 A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh) David

More information

Design for Targeted Therapies: Statistical Considerations

Design for Targeted Therapies: Statistical Considerations Design for Targeted Therapies: Statistical Considerations J. Jack Lee, Ph.D. Department of Biostatistics University of Texas M. D. Anderson Cancer Center Outline Premise General Review of Statistical Designs

More information

Bayesian Joint Modelling of Benefit and Risk in Drug Development

Bayesian Joint Modelling of Benefit and Risk in Drug Development Bayesian Joint Modelling of Benefit and Risk in Drug Development EFSPI/PSDM Safety Statistics Meeting Leiden 2017 Disclosure is an employee and shareholder of GSK Data presented is based on human research

More information

A fully Bayesian approach for the analysis of Whole-Genome Bisulfite Sequencing Data

A fully Bayesian approach for the analysis of Whole-Genome Bisulfite Sequencing Data A fully Bayesian approach for the analysis of Whole-Genome Bisulfite Sequencing Data Leonardo Bottolo 1,2,3 1 Department of Medical Genetics, University of Cambridge, UK 2 The Alan Turing Institute, London,

More information

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015 Goals/Expectations Computer Science, Biology, and Biomedical (CoSBBI) We want to excite you about the world of computer science, biology, and biomedical informatics. Experience what it is like to be a

More information

Bayesian Random SegmentationModels to Identify Shared Copy Number Aberrations for Array CGH Data

Bayesian Random SegmentationModels to Identify Shared Copy Number Aberrations for Array CGH Data From the SelectedWorks of Veera Baladandayuthapani 2 Bayesian Random SegmentationModels to Identify Shared Copy Number Aberrations for Array CGH Data Veera Baladandayuthapani Available at: https://works.bepress.com/veera/3/

More information

Information Systems Mini-Monograph

Information Systems Mini-Monograph Information Systems Mini-Monograph Interpreting Posterior Relative Risk Estimates in Disease-Mapping Studies Sylvia Richardson, Andrew Thomson, Nicky Best, and Paul Elliott Small Area Health Statistics

More information

Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer

Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Pei Wang Department of Statistics Stanford University Stanford, CA 94305 wp57@stanford.edu Young Kim, Jonathan Pollack Department

More information

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data Missing data Patrick Breheny April 3 Patrick Breheny BST 71: Bayesian Modeling in Biostatistics 1/39 Our final topic for the semester is missing data Missing data is very common in practice, and can occur

More information

Analysis of acgh data: statistical models and computational challenges

Analysis of acgh data: statistical models and computational challenges : statistical models and computational challenges Ramón Díaz-Uriarte 2007-02-13 Díaz-Uriarte, R. acgh analysis: models and computation 2007-02-13 1 / 38 Outline 1 Introduction Alternative approaches What

More information

Understanding DNA Copy Number Data

Understanding DNA Copy Number Data Understanding DNA Copy Number Data Adam B. Olshen Department of Epidemiology and Biostatistics Helen Diller Family Comprehensive Cancer Center University of California, San Francisco http://cc.ucsf.edu/people/olshena_adam.php

More information

Bayesian Random Segmentation Models to Identify Shared Copy Number Aberrations for Array CGH Data

Bayesian Random Segmentation Models to Identify Shared Copy Number Aberrations for Array CGH Data Supplementary materials for this article are available online. Please click the JASA link at http://pubs.amstat.org. Bayesian Random Segmentation Models to Identify Shared Copy Number Aberrations for Array

More information

Individual Differences in Attention During Category Learning

Individual Differences in Attention During Category Learning Individual Differences in Attention During Category Learning Michael D. Lee (mdlee@uci.edu) Department of Cognitive Sciences, 35 Social Sciences Plaza A University of California, Irvine, CA 92697-5 USA

More information

T-Statistic-based Up&Down Design for Dose-Finding Competes Favorably with Bayesian 4-parameter Logistic Design

T-Statistic-based Up&Down Design for Dose-Finding Competes Favorably with Bayesian 4-parameter Logistic Design T-Statistic-based Up&Down Design for Dose-Finding Competes Favorably with Bayesian 4-parameter Logistic Design James A. Bolognese, Cytel Nitin Patel, Cytel Yevgen Tymofyeyef, Merck Inna Perevozskaya, Wyeth

More information

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Manuela Zucknick Division of Biostatistics, German Cancer Research Center Biometry Workshop,

More information

Bayesian Nonparametric Methods for Precision Medicine

Bayesian Nonparametric Methods for Precision Medicine Bayesian Nonparametric Methods for Precision Medicine Brian Reich, NC State Collaborators: Qian Guan (NCSU), Eric Laber (NCSU) and Dipankar Bandyopadhyay (VCU) University of Illinois at Urbana-Champaign

More information

A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer

A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer Hautaniemi, Sampsa; Ringnér, Markus; Kauraniemi, Päivikki; Kallioniemi, Anne; Edgren, Henrik; Yli-Harja, Olli; Astola,

More information

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical

More information

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics Mike West Duke University Papers, software, many links: www.isds.duke.edu/~mw ABS04 web site: Lecture slides, stats notes, papers,

More information

WinBUGS : part 1. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis

WinBUGS : part 1. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis WinBUGS : part 1 Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert Gabriele, living with rheumatoid arthritis Agenda 2 Introduction to WinBUGS Exercice 1 : Normal with unknown mean and variance

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University

More information

Selection of Linking Items

Selection of Linking Items Selection of Linking Items Subset of items that maximally reflect the scale information function Denote the scale information as Linear programming solver (in R, lp_solve 5.5) min(y) Subject to θ, θs,

More information

Bayesian Inference Bayes Laplace

Bayesian Inference Bayes Laplace Bayesian Inference Bayes Laplace Course objective The aim of this course is to introduce the modern approach to Bayesian statistics, emphasizing the computational aspects and the differences between the

More information

Bayesian Prediction Tree Models

Bayesian Prediction Tree Models Bayesian Prediction Tree Models Statistical Prediction Tree Modelling for Clinico-Genomics Clinical gene expression data - expression signatures, profiling Tree models for predictive sub-typing Combining

More information

Identification of regions with common copy-number variations using SNP array

Identification of regions with common copy-number variations using SNP array Identification of regions with common copy-number variations using SNP array Agus Salim Epidemiology and Public Health National University of Singapore Copy Number Variation (CNV) Copy number alteration

More information

Bayesian growth mixture models to distinguish hemoglobin value trajectories in blood donors

Bayesian growth mixture models to distinguish hemoglobin value trajectories in blood donors Bayesian growth mixture models to distinguish hemoglobin value trajectories in blood donors Kazem Nasserinejad 1 Joost van Rosmalen 1 Mireille Baart 2 Katja van den Hurk 2 Dimitris Rizopoulos 1 Emmanuel

More information

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK CHAPTER 6 DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK Genetic research aimed at the identification of new breast cancer susceptibility genes is at an interesting crossroad. On the one hand, the existence

More information

Ordinal Data Modeling

Ordinal Data Modeling Valen E. Johnson James H. Albert Ordinal Data Modeling With 73 illustrations I ". Springer Contents Preface v 1 Review of Classical and Bayesian Inference 1 1.1 Learning about a binomial proportion 1 1.1.1

More information

Package xseq. R topics documented: September 11, 2015

Package xseq. R topics documented: September 11, 2015 Package xseq September 11, 2015 Title Assessing Functional Impact on Gene Expression of Mutations in Cancer Version 0.2.1 Date 2015-08-25 Author Jiarui Ding, Sohrab Shah Maintainer Jiarui Ding

More information

Bayesian meta-analysis of Papanicolaou smear accuracy

Bayesian meta-analysis of Papanicolaou smear accuracy Gynecologic Oncology 107 (2007) S133 S137 www.elsevier.com/locate/ygyno Bayesian meta-analysis of Papanicolaou smear accuracy Xiuyu Cong a, Dennis D. Cox b, Scott B. Cantor c, a Biometrics and Data Management,

More information

Introduction to LOH and Allele Specific Copy Number User Forum

Introduction to LOH and Allele Specific Copy Number User Forum Introduction to LOH and Allele Specific Copy Number User Forum Jonathan Gerstenhaber Introduction to LOH and ASCN User Forum Contents 1. Loss of heterozygosity Analysis procedure Types of baselines 2.

More information

Protocol to Patient (P2P)

Protocol to Patient (P2P) Protocol to Patient (P2P) Ghulam Warsi 1, Kert Viele 2, Lebedinsky Claudia 1,, Parasuraman Sudha 1, Eric Slosberg 1, Barinder Kang 1, August Salvado 1, Lening Zhang 1, Donald A. Berry 2 1 Novartis Pharmaceuticals

More information

Experimental Design For Microarray Experiments. Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit

Experimental Design For Microarray Experiments. Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit Experimental Design For Microarray Experiments Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit Copyright 2002 Complexity of Genomic data the functioning of cells is a complex and highly

More information

Att vara eller inte vara (en Bayesian)?... Sherlock-conundrum

Att vara eller inte vara (en Bayesian)?... Sherlock-conundrum Att vara eller inte vara (en Bayesian)?... Sherlock-conundrum (Thanks/blame to Google Translate) Gianluca Baio University College London Department of Statistical Science g.baio@ucl.ac.uk http://www.ucl.ac.uk/statistics/research/statistics-health-economics/

More information

Introduction to Discrimination in Microarray Data Analysis

Introduction to Discrimination in Microarray Data Analysis Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t

More information

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit APPLICATION NOTE Ion PGM System Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit Key findings The Ion PGM System, in concert with the Ion ReproSeq PGS View Kit and Ion Reporter

More information

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison Group-Level Diagnosis 1 N.B. Please do not cite or distribute. Multilevel IRT for group-level diagnosis Chanho Park Daniel M. Bolt University of Wisconsin-Madison Paper presented at the annual meeting

More information

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer:

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer: Bioinformatics/Genomics of Cancer: Professor of Computer Science, Mathematics and Cell Biology Courant Institute, NYU School of Medicine, Tata Institute of Fundamental Research, and Mt. Sinai School of

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

Introduction to Bayesian Analysis 1

Introduction to Bayesian Analysis 1 Biostats VHM 801/802 Courses Fall 2005, Atlantic Veterinary College, PEI Henrik Stryhn Introduction to Bayesian Analysis 1 Little known outside the statistical science, there exist two different approaches

More information

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene

More information

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T. Diagnostic Tests 1 Introduction Suppose we have a quantitative measurement X i on experimental or observed units i = 1,..., n, and a characteristic Y i = 0 or Y i = 1 (e.g. case/control status). The measurement

More information

A HIERARCHICAL BAYESIAN MODEL FOR INFERENCE OF COPY NUMBER VARIANTS AND THEIR ASSOCIATION TO GENE EXPRESSION

A HIERARCHICAL BAYESIAN MODEL FOR INFERENCE OF COPY NUMBER VARIANTS AND THEIR ASSOCIATION TO GENE EXPRESSION The Annals of Applied Statistics 2014, Vol. 8, No. 1, 148 175 DOI: 10.1214/13-AOAS705 Institute of Mathematical Statistics, 2014 A HIERARCHICAL BAYESIAN MODEL FOR INFERENCE OF COPY NUMBER VARIANTS AND

More information

ChIP-seq data analysis

ChIP-seq data analysis ChIP-seq data analysis Harri Lähdesmäki Department of Computer Science Aalto University November 24, 2017 Contents Background ChIP-seq protocol ChIP-seq data analysis Transcriptional regulation Transcriptional

More information

Analysis of CGH and SNP arrays for the detection of chromosomal aberrations in single cells

Analysis of CGH and SNP arrays for the detection of chromosomal aberrations in single cells Analysis of CGH and SNP arrays for the detection of chromosomal aberrations in single cells Peter Konings 1 Evelyne Vanneste 1,2 Thierry Voet 1 Cédric Le Caignec 1 Michèle Ampe 1 Cindy Melotte 1 Sophie

More information

Outlier Analysis. Lijun Zhang

Outlier Analysis. Lijun Zhang Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based

More information

Knowledge Discovery and Data Mining I

Knowledge Discovery and Data Mining I Ludwig-Maximilians-Universität München Lehrstuhl für Datenbanksysteme und Data Mining Prof. Dr. Thomas Seidl Knowledge Discovery and Data Mining I Winter Semester 2018/19 Introduction What is an outlier?

More information

10CS664: PATTERN RECOGNITION QUESTION BANK

10CS664: PATTERN RECOGNITION QUESTION BANK 10CS664: PATTERN RECOGNITION QUESTION BANK Assignments would be handed out in class as well as posted on the class blog for the course. Please solve the problems in the exercises of the prescribed text

More information

Genomics Research. May 31, Malvika Pillai

Genomics Research. May 31, Malvika Pillai Genomics Research May 31, 2018 Malvika Pillai Outline for Research Discussion Why Informatics Read journal articles! Example Paper Presentation Research Pipeline How To Read A Paper If I m aiming to just

More information

Data Analysis Using Regression and Multilevel/Hierarchical Models

Data Analysis Using Regression and Multilevel/Hierarchical Models Data Analysis Using Regression and Multilevel/Hierarchical Models ANDREW GELMAN Columbia University JENNIFER HILL Columbia University CAMBRIDGE UNIVERSITY PRESS Contents List of examples V a 9 e xv " Preface

More information

Gianluca Baio. University College London Department of Statistical Science.

Gianluca Baio. University College London Department of Statistical Science. Bayesian hierarchical models and recent computational development using Integrated Nested Laplace Approximation, with applications to pre-implantation genetic screening in IVF Gianluca Baio University

More information

arxiv: v2 [stat.ap] 11 Apr 2018

arxiv: v2 [stat.ap] 11 Apr 2018 A Novel Bayesian Multiple Testing Approach to Deregulated mirna Discovery Harnessing Positional Clustering Noirrit Kiran Chandra 1, Richa Singh 2 and Sourabh Bhattacharya 1 1 Interdisciplinary Statistical

More information

Inference of Isoforms from Short Sequence Reads

Inference of Isoforms from Short Sequence Reads Inference of Isoforms from Short Sequence Reads Tao Jiang Department of Computer Science and Engineering University of California, Riverside Tsinghua University Joint work with Jianxing Feng and Wei Li

More information

Joint Spatio-Temporal Modeling of Low Incidence Cancers Sharing Common Risk Factors

Joint Spatio-Temporal Modeling of Low Incidence Cancers Sharing Common Risk Factors Journal of Data Science 6(2008), 105-123 Joint Spatio-Temporal Modeling of Low Incidence Cancers Sharing Common Risk Factors Jacob J. Oleson 1,BrianJ.Smith 1 and Hoon Kim 2 1 The University of Iowa and

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

OncoPhase: Quantification of somatic mutation cellular prevalence using phase information

OncoPhase: Quantification of somatic mutation cellular prevalence using phase information OncoPhase: Quantification of somatic mutation cellular prevalence using phase information Donatien Chedom-Fotso 1, 2, 3, Ahmed Ashour Ahmed 1, 2, and Christopher Yau 3, 4 1 Ovarian Cancer Cell Laboratory,

More information

Improving ecological inference using individual-level data

Improving ecological inference using individual-level data Improving ecological inference using individual-level data Christopher Jackson, Nicky Best and Sylvia Richardson Department of Epidemiology and Public Health, Imperial College School of Medicine, London,

More information

Supplementary methods:

Supplementary methods: Supplementary methods: Primers sequences used in real-time PCR analyses: β-actin F: GACCTCTATGCCAACACAGT β-actin [11] R: AGTACTTGCGCTCAGGAGGA MMP13 F: TTCTGGTCTTCTGGCACACGCTTT MMP13 R: CCAAGCTCATGGGCAGCAACAATA

More information

Neoplasia 2018 lecture 11. Dr H Awad FRCPath

Neoplasia 2018 lecture 11. Dr H Awad FRCPath Neoplasia 2018 lecture 11 Dr H Awad FRCPath Clinical aspects of neoplasia Tumors affect patients by: 1. their location 2. hormonal secretions 3. paraneoplastic syndromes 4. cachexia Tumor location Even

More information

A Case Study: Two-sample categorical data

A Case Study: Two-sample categorical data A Case Study: Two-sample categorical data Patrick Breheny January 31 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/43 Introduction Model specification Continuous vs. mixture priors Choice

More information

Expanded View Figures

Expanded View Figures Solip Park & Ben Lehner Epistasis is cancer type specific Molecular Systems Biology Expanded View Figures A B G C D E F H Figure EV1. Epistatic interactions detected in a pan-cancer analysis and saturation

More information

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used

More information

Sensitivity of heterogeneity priors in meta-analysis

Sensitivity of heterogeneity priors in meta-analysis Sensitivity of heterogeneity priors in meta-analysis Ma lgorzata Roos BAYES2015, 19.-22.05.2015 15/05/2015 Page 1 Bayesian approaches to incorporating historical information in clinical trials Joint work

More information

APPENDIX AVAILABLE ON REQUEST. HEI Panel on the Health Effects of Traffic-Related Air Pollution

APPENDIX AVAILABLE ON REQUEST. HEI Panel on the Health Effects of Traffic-Related Air Pollution APPENDIX AVAILABLE ON REQUEST Special Report 17 Traffic-Related Air Pollution: A Critical Review of the Literature on Emissions, Exposure, and Health Effects Chapter 3. Assessment of Exposure to Traffic-Related

More information

Structural Variation and Medical Genomics

Structural Variation and Medical Genomics Structural Variation and Medical Genomics Andrew King Department of Biomedical Informatics July 8, 2014 You already know about small scale genetic mutations Single nucleotide polymorphism (SNPs) Deletions,

More information

Using mixture priors for robust inference: application in Bayesian dose escalation trials

Using mixture priors for robust inference: application in Bayesian dose escalation trials Using mixture priors for robust inference: application in Bayesian dose escalation trials Astrid Jullion, Beat Neuenschwander, Daniel Lorand BAYES2014, London, 11 June 2014 Agenda Dose escalation in oncology

More information

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Bayesians methods in system identification: equivalences, differences, and misunderstandings Bayesians methods in system identification: equivalences, differences, and misunderstandings Johan Schoukens and Carl Edward Rasmussen ERNSI 217 Workshop on System Identification Lyon, September 24-27,

More information

Molecular Markers. Marcie Riches, MD, MS Associate Professor University of North Carolina Scientific Director, Infection and Immune Reconstitution WC

Molecular Markers. Marcie Riches, MD, MS Associate Professor University of North Carolina Scientific Director, Infection and Immune Reconstitution WC Molecular Markers Marcie Riches, MD, MS Associate Professor University of North Carolina Scientific Director, Infection and Immune Reconstitution WC Overview Testing methods Rationale for molecular testing

More information

Bayesian methods in health economics

Bayesian methods in health economics Bayesian methods in health economics Gianluca Baio University College London Department of Statistical Science g.baio@ucl.ac.uk Seminar Series of the Master in Advanced Artificial Intelligence Madrid,

More information

Sensory Cue Integration

Sensory Cue Integration Sensory Cue Integration Summary by Byoung-Hee Kim Computer Science and Engineering (CSE) http://bi.snu.ac.kr/ Presentation Guideline Quiz on the gist of the chapter (5 min) Presenters: prepare one main

More information

Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre

Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre Structural variation (SVs) Copy-number variations C Deletion A B C Balanced rearrangements A B A B C B A C Duplication Inversion Causes

More information

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer Genetic alterations of histone lysine methyltransferases and their significance in breast cancer Supplementary Materials and Methods Phylogenetic tree of the HMT superfamily The phylogeny outlined in the

More information

Computational Analysis of Genome-Wide DNA Copy Number Changes

Computational Analysis of Genome-Wide DNA Copy Number Changes Computational Analysis of Genome-Wide DNA Copy Number Changes Lei Song Thesis submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements

More information

Kelvin Chan Feb 10, 2015

Kelvin Chan Feb 10, 2015 Underestimation of Variance of Predicted Mean Health Utilities Derived from Multi- Attribute Utility Instruments: The Use of Multiple Imputation as a Potential Solution. Kelvin Chan Feb 10, 2015 Outline

More information

CARISMA-LMS Workshop on Statistics for Risk Analysis

CARISMA-LMS Workshop on Statistics for Risk Analysis Department of Mathematics CARISMA-LMS Workshop on Statistics for Risk Analysis Thursday 28 th May 2015 Location: Department of Mathematics, John Crank Building, Room JNCK128 (Campus map can be found at

More information

A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High- Resolution acgh Data

A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High- Resolution acgh Data A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High- Resolution acgh Data Chihyun Park 1, Jaegyoon Ahn 1, Youngmi Yoon 2, Sanghyun Park 1 * 1 Department

More information

Systematic Analysis for Identification of Genes Impacting Cancers

Systematic Analysis for Identification of Genes Impacting Cancers Systematic Analysis for Identification of Genes Impacting Cancers Arpita Singhal Stanford University Saint Francis High School ABSTRACT Currently, vast amounts of molecular information involving genomic

More information

SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models

SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models Zafar et al. Genome Biology (2017) 18:178 DOI 10.1186/s13059-017-1311-2 METHOD Open Access SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models Hamim Zafar 1,2, Anthony

More information

Using the Testlet Model to Mitigate Test Speededness Effects. James A. Wollack Youngsuk Suh Daniel M. Bolt. University of Wisconsin Madison

Using the Testlet Model to Mitigate Test Speededness Effects. James A. Wollack Youngsuk Suh Daniel M. Bolt. University of Wisconsin Madison Using the Testlet Model to Mitigate Test Speededness Effects James A. Wollack Youngsuk Suh Daniel M. Bolt University of Wisconsin Madison April 12, 2007 Paper presented at the annual meeting of the National

More information

Bayesian Benefit-Risk Assessment. Maria Costa GSK R&D

Bayesian Benefit-Risk Assessment. Maria Costa GSK R&D Assessment GSK R&D Disclosure is an employee and shareholder of GSK Data presented is based on human research studies funded and sponsored by GSK 2 Outline 1. Motivation 2. GSK s Approach to Benefit-Risk

More information

White Paper. Copy number variant detection. Sample to Insight. August 19, 2015

White Paper. Copy number variant detection. Sample to Insight. August 19, 2015 White Paper Copy number variant detection August 19, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 Fax: +45 86 20 12 22 www.clcbio.com

More information

TRIPODS Workshop: Models & Machine Learning for Causal I. & Decision Making

TRIPODS Workshop: Models & Machine Learning for Causal I. & Decision Making TRIPODS Workshop: Models & Machine Learning for Causal Inference & Decision Making in Medical Decision Making : and Predictive Accuracy text Stavroula Chrysanthopoulou, PhD Department of Biostatistics

More information

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve

More information

Applications with Bayesian Approach

Applications with Bayesian Approach Applications with Bayesian Approach Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Outline 1 Missing Data in Longitudinal Studies 2 FMRI Analysis

More information

Bayesian Models for Combining Data Across Domains and Domain Types in Predictive fmri Data Analysis (Thesis Proposal)

Bayesian Models for Combining Data Across Domains and Domain Types in Predictive fmri Data Analysis (Thesis Proposal) Bayesian Models for Combining Data Across Domains and Domain Types in Predictive fmri Data Analysis (Thesis Proposal) Indrayana Rustandi Computer Science Department Carnegie Mellon University March 26,

More information

Searching for Temporal Patterns in AmI Sensor Data

Searching for Temporal Patterns in AmI Sensor Data Searching for Temporal Patterns in AmI Sensor Data Romain Tavenard 1,2, Albert A. Salah 1, Eric J. Pauwels 1 1 Centrum voor Wiskunde en Informatica, CWI Amsterdam, The Netherlands 2 IRISA/ENS de Cachan,

More information

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari *

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari * Electronic Journal of Applied Statistical Analysis EJASA (2012), Electron. J. App. Stat. Anal., Vol. 5, Issue 3, 431 437 e-issn 2070-5948, DOI 10.1285/i20705948v5n3p431 2012 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index

More information

WINTHER: GUSTAVE ROUSSY GUSTAVE ROUSSY. NOM DU DOCUMENT / Date

WINTHER: GUSTAVE ROUSSY GUSTAVE ROUSSY. NOM DU DOCUMENT / Date WINTHER Study Jean-Charles Soria, Razelle Kurzrock, Josep Tabernero, Apostolia Tsimberidou, Jordi Rodon, Raanan Berger, Amir Onn, Gerald Batist, Eitan Rubin, Yohann Loriot, Catherine Bresson, Vladimir

More information

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014 Challenges of CGH array testing in children with developmental delay Dr Sally Davies 17 th September 2014 CGH array What is CGH array? Understanding the test Benefits Results to expect Consent issues Ethical

More information

Reducing Decision Errors in the Paired Comparison of the Diagnostic Accuracy of Continuous Screening Tests

Reducing Decision Errors in the Paired Comparison of the Diagnostic Accuracy of Continuous Screening Tests Reducing Decision Errors in the Paired Comparison of the Diagnostic Accuracy of Continuous Screening Tests Brandy M. Ringham, 1 Todd A. Alonzo, 2 John T. Brinton, 1 Aarti Munjal, 1 Keith E. Muller, 3 Deborah

More information

Pancreatic Cancer Research and HMGB1 Signaling Pathway

Pancreatic Cancer Research and HMGB1 Signaling Pathway Pancreatic Cancer Research and HMGB1 Signaling Pathway Haijun Gong*, Paolo Zuliani*, Anvesh Komuravelli*, James R. Faeder #, Edmund M. Clarke* * # The Hallmarks of Cancer D. Hanahan and R. A. Weinberg

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Integrated Analysis of Copy Number and Gene Expression

Integrated Analysis of Copy Number and Gene Expression Integrated Analysis of Copy Number and Gene Expression Nexus Copy Number provides user-friendly interface and functionalities to integrate copy number analysis with gene expression results for the purpose

More information

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Combining Risks from Several Tumors Using Markov Chain Monte Carlo University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln U.S. Environmental Protection Agency Papers U.S. Environmental Protection Agency 2009 Combining Risks from Several Tumors

More information