Statistical Issues in Translational Cancer Research

Size: px
Start display at page:

Download "Statistical Issues in Translational Cancer Research"

Transcription

1 Statistical Issues in Translational Cancer Research Martin Bøgsted Department of Haematology Aalborg University Hospital and Department of Clinical Medicine Aalborg University The only useful function of a statistician is to make predictions and thus to provide a basis for action Deming, WE Statistical issues Klitgaarden 2013 November 13, / 28

2 Programme Presentation 1/2 hour Journal club 1/2 hour Wright et al., 2000, Nature Hans et al., 2004, Blood Hernandalez-Illizaliture, 2011, Cancer Discussion 1/2 hour How did ABC/GCB found its way into clinical trials? Could this have been done more efficiently? What do we do in the future to speed up the translational process? Could better statistical insight have helped during the process? Statistical issues Klitgaarden 2013 November 13, / 28

3 Outline Motivation The biomarker vocabulary Predictive biomarkers and classifiers Phases in clinical trials The translational pathway A personalized drug development strategy (our) Unsupervised cluster analysis Differentially expressed genomic features Build the classifier Feature selection Assay development Validation Design of clinical trials Discussion Statistical issues Klitgaarden 2013 November 13, / 28

4 Motivation The recognition of the heterogeneity of tumors of the same primary site, availability of the tools of genomics for characterizing tumors, and focus on molecularly targeted drugs, has resulted in increased interest in predictive classification problems and the need for new clinical trial designs Statistical issues Klitgaarden 2013 November 13, / 28

5 The biomarker vocabulary Traditional biomarkers are measured to track the pace of a disease increasing as the disease progresses and decresing as it regresses (surrogate endpoints). Prognostic biomarkers are measured before treatment to indicate which patients receiving standard treatment have sufficiently good prognosis that they do not need additional treatment. Predictive biomarkers are measured before treatment to identify who is likely or unlikely to benefit from a particular treatment. Statistical issues Klitgaarden 2013 November 13, / 28

6 Predictive biomarkers and classifiers Taken from: Richard Simon, NCI Most prognostic factors are not used because they are not therapeutically relevant Most prognostic factor studies are not focused on a clear objective They use a convenience sample of patients for who tissue is available Often the patients are too heterogeneous to support therapeutically relevant conclustions Most cancer treatments benefit only a minority of patients to whom they are administered Being able to predict which patients are likely to benefit would save patients from unnecessary toxicity, and enhance their chance of receiving a drug that helps them Help control medical costs Medicine needs predicitve not prognostic biomarkers A predictive classifier is a method which on basis of a number of predictive biomarkers measured before treatment can predict whether a particular treatment is likely to be beneficial Statistical issues Klitgaarden 2013 November 13, / 28

7 Phases in clinical trials Drug validation Biomarker validation Phase I Stage I Phase II Stage II Phase III Stage III Phase IV Stage IV Statistical issues Klitgaarden 2013 November 13, / 28

8 The translational pathway Assay Development Validation of predictive models Design of clinical trials Statistical issues Klitgaarden 2013 November 13, / 28

9 A personalised drug development strategy Tumor biopsies from cancer patients (n) Tumour genomic analysis Supervised/unsupervised cluster analysis Partition patients into new clusters Differentially expressed features (p) Build classifier (p >>n) Feature enrichment Drug target identification Assay development Clinical trials Statistical issues Klitgaarden 2013 November 13, / 28

10 Unsupervised cluster analysis The algorithm Given a set of n items to be clustered, and an n n distance (or similarity) matrix, the basic process of hierarchical clustering is this: 1 Start by assigning each item to its own cluster, so that if you have n items, you now have n clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain. 2 Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster. 3 Compute distances (similarities) between the new cluster and each of the old clusters. 4 Repeat steps 2 and 3 until all items are clustered into a single cluster of size n. Statistical issues Klitgaarden 2013 November 13, / 28

11 Unsupervised cluster analysis Examples Alizadeh et al., 2000, Nature Statistical issues Klitgaarden 2013 November 13, / 28

12 A personalised drug development strategy Tumor biopsies from cancer patients (n) Tumour genomic analysis Supervised/unsupervised cluster analysis Partition patients into new clusters Differentially expressed features (p) Build classifier (p >>n) Feature enrichment Drug target identification Assay development Clinical trials Statistical issues Klitgaarden 2013 November 13, / 28

13 Differentially expressed genomic features Difference in expression detected by variations of the t-test Multiple test correction (Bonferroni, FDR, etc.) Advances in biotechnology requires new approaches like linear and linear mixed models, generalized linear mixed models, Bayesian approaches etc. Kloster et al., 2012, BMC Genomics Statistical issues Klitgaarden 2013 November 13, / 28

14 A personalised drug development strategy Tumor biopsies from cancer patients (n) Tumour genomic analysis Supervised/unsupervised cluster analysis Partition patients into new clusters Differentially expressed features (p) Build classifier (p >>n) Feature enrichment Drug target identification Assay development Clinical trials Statistical issues Klitgaarden 2013 November 13, / 28

15 LMO2 CD53 CD40 IRF8 REL BACH2 MALT1 TFRC TP53 ETS1 CD81 CD86 STAT5B FAS CD22 MS4A1 CD72 CR2 PAX5 NFKB1 POU2F2 CXCR5 CD24 SPI1 CXCR4 FOXP1 MKI67 BCL6 SERPINA9 CCNB1 POU2AF1 AICDA SOX5 FOXO1 PTPRC FNIP1 STAT3 RUNX1 GCET2 MTA3 ADA TCF3 IKZF1 CD38 CD9 PTK2B PLCG2 RAG1 IL2RA RELA SOX4 BCL3 ICAM1 PRKDC MITF JUN GPR183 FOS PECAM1 MUM1 XBP1 SDC1 PRDM1 IRF4 LGALS1 RUNX2 SPN CCR6 BCL2 CD44 CDKN1A LGALS8 KLF9 CD200 KLF4 CD48 MIR155HG MYC FCER2 ITGA4 TNFRSF8 ZBTB16 CD5 NKX2 3 TLR9 RAG2 Build a classifier Demonstrating statistical significance of prognostic factors is NOT the same as demonstrating predictive accuracy. Statisticians (and other scientists) are used to inference, not prediction Most statistical methods were not developed for p >> n prediction problems Color Key Subpopulation profiles 2 2 Row Z Score Centroblast B250 Centroblast B233 Centroblast B236 Centroblast B238 Centroblast B249 Centroblast B234 Centroblast B235 Centrocyte B233 Centrocyte B250 Centrocyte B235 Centrocyte B236 Centrocyte B249 Centrocyte B234 Centrocyte B238 Plasmablast B238 Plasmablast B233 Plasmablast B237 Plasmablast B235 Plasmablast B236 Plasmablast B249 Plasmablast B234 Memory B235 Memory B234 Memory B249 Memory B233 Memory B236 Memory B250 Naive B237 Naive B233 Naive B249 Naive B250 Naive B234 Naive B235 Genes Samples Statistical Dybkær issues et al., 2013, in prep. Klitgaarden 2013 November 13, / 28

16 Introduction to classifier training Classification begins with a specification of two spaces: X = R p p-dimensional Euclidean space of feature vectors Y = {1,..., k} k classes or class labels Assume {X i, Y i } n i=1 is a collection of training data. Then the empirical risk is defined as ˆR n (f ) = 1 n l(f (X i ), Y i ). n i=1 where l is a loss function used to measure the loss of errornous decisions. Statistical learning can now be formulated as minimizing the empirical risk, i.e. ˆf n = arg min ˆR n (f ). f F Consider e.g. the the 0-1 loss l(ˆf n (X ), Y ) = 1{ˆf n (X ) Y } Statistical issues Klitgaarden 2013 November 13, / 28

17 Model Assessment y x Polynomial degree Prediction Error empirical risk underfitting Best Model overfitting true risk Complexity Robert Nowak, 2011 Statistical issues Klitgaarden 2013 November 13, / 28

18 Strategies to Avoid Overfitting Use e.g. Dimension reduction (methods of sieves) Penalization (shrinkage) Bayesian methods In combination with Complicated mathematics (min-max lower bounds) Hold out methods (e.g. leave-one-out-cross-validation) Statistical issues Klitgaarden 2013 November 13, / 28

19 Model Assessment Normally one work with The sample/data The population Ideally one splits the sample into the following data sets: Training set: A sub-sample used for learning, that is to fit the parameters (i.e., weights) of the classifier. Validation set: A sub-sample used to tune the parameters (i.e., architecture, not weights) of a classifier. Test set: A sub-sample used only to assess the performance (generalization) of a fully-specified classifier. Note that the training and validation data sets are often combined and tuning as well as assessment are done by cross-validation over the training set. Statistical issues Klitgaarden 2013 November 13, / 28

20 A personalised drug development strategy Tumor biopsies from cancer patients (n) Tumour genomic analysis Supervised/unsupervised cluster analysis Partition patients into new clusters Differentially expressed features (p) Build classifier (p >>n) Feature enrichment Drug target identification Assay development Clinical trials Statistical issues Klitgaarden 2013 November 13, / 28

21 Feature selection Differential coexpression ABC vs. GCB (LMPP) Gene ontology analysis Weighted network analysis Statistical issues Klitgaarden 2013 November 13, / 28

22 A personalised drug development strategy Tumor biopsies from cancer patients (n) Tumour genomic analysis Supervised/unsupervised cluster analysis Partition patients into new clusters Differentially expressed features (p) Build classifier (p >>n) Feature enrichment Drug target identification Assay development Clinical trials Statistical issues Klitgaarden 2013 November 13, / 28

23 Assay development Microarrays: Typically not much in use. qpcr Flow cytometry Immunohistochemestry Hans, 2004, Blood Statistical issues Klitgaarden 2013 November 13, / 28

24 Validation Drug target identification Cell line models Animal models Retrospective data analysis Phase II/Stage II clinical trials Falgreen et al., 2013, submitted Statistical issues Klitgaarden 2013 November 13, / 28

25 Validation The classifier Retrospective data analysis Phase II/Stage II clinical trials Statistical issues Klitgaarden 2013 November 13, / 28

26 A personalised drug development strategy Tumor biopsies from cancer patients (n) Tumour genomic analysis Supervised/unsupervised cluster analysis Partition patients into new clusters Differentially expressed features (p) Build classifier (p >>n) Feature enrichment Drug target identification Assay development Clinical trials Statistical issues Klitgaarden 2013 November 13, / 28

27 Design of clinical trials (phase III/stage III) End point Tumour acitivity Time to pregression Overall survival Design issues Enrichment designs etc. (Maitournam and Simon, 2005, Statis. Med.) Sample size calculations allows us to determine the sample size required to estimate the performance of a classifier with a given precision allows us to determine the sample size required to detect an effect of a given size with a given degree of confidence. The protocol Analysis plan Sample size calculations Statistical issues Klitgaarden 2013 November 13, / 28

28 Discussion Statistical issues Klitgaarden 2013 November 13, / 28

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

Biomarker adaptive designs in clinical trials

Biomarker adaptive designs in clinical trials Review Article Biomarker adaptive designs in clinical trials James J. Chen 1, Tzu-Pin Lu 1,2, Dung-Tsa Chen 3, Sue-Jane Wang 4 1 Division of Bioinformatics and Biostatistics, National Center for Toxicological

More information

Supplementary Table S1: Gene expression targets

Supplementary Table S1: Gene expression targets Supplementary Table S1: Gene expression targets Probe ID Gene ID Gene Descriptions Cell type Antigen-dependent activation MmugDNA.32422.1.S1_at CD180 CD180 molecule B 1 MmugDNA.18254.1.S1_at CD19 CD19

More information

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm: The clustering problem: partition genes into distinct sets with high homogeneity and high separation Hierarchical clustering algorithm: 1. Assign each object to a separate cluster. 2. Regroup the pair

More information

Bayesian Prediction Tree Models

Bayesian Prediction Tree Models Bayesian Prediction Tree Models Statistical Prediction Tree Modelling for Clinico-Genomics Clinical gene expression data - expression signatures, profiling Tree models for predictive sub-typing Combining

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

A Versatile Algorithm for Finding Patterns in Large Cancer Cell Line Data Sets

A Versatile Algorithm for Finding Patterns in Large Cancer Cell Line Data Sets A Versatile Algorithm for Finding Patterns in Large Cancer Cell Line Data Sets James Jusuf, Phillips Academy Andover May 21, 2017 MIT PRIMES The Broad Institute of MIT and Harvard Introduction A quest

More information

Use of Archived Tissues in the Development and Validation of Prognostic & Predictive Biomarkers

Use of Archived Tissues in the Development and Validation of Prognostic & Predictive Biomarkers Use of Archived Tissues in the Development and Validation of Prognostic & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov Different

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Reliable Evaluation of Prognostic & Predictive Genomic Tests

Reliable Evaluation of Prognostic & Predictive Genomic Tests Reliable Evaluation of Prognostic & Predictive Genomic Tests Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov Different Kinds of Biomarkers Prognostic

More information

Defined lymphoma entities in the current WHO classification

Defined lymphoma entities in the current WHO classification Defined lymphoma entities in the current WHO classification Luca Mazzucchelli Istituto cantonale di patologia, Locarno Bellinzona, January 29-31, 2016 Evolution of lymphoma classification Rappaport Lukes

More information

Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang

Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang Abstract: Unlike most cancers, thyroid cancer has an everincreasing incidence rate

More information

The Avatar System TM Yields Biologically Relevant Results

The Avatar System TM Yields Biologically Relevant Results Application Note The Avatar System TM Yields Biologically Relevant Results Liquid biopsies stand to revolutionize the cancer field, enabling early detection and noninvasive monitoring of tumors. In the

More information

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression

More information

Computational Capacity and Statistical Inference: A Never Ending Interaction. Finbarr Sloane EHR/DRL

Computational Capacity and Statistical Inference: A Never Ending Interaction. Finbarr Sloane EHR/DRL Computational Capacity and Statistical Inference: A Never Ending Interaction Finbarr Sloane EHR/DRL Studies in Crop Variation I (1921) It has been estimated that Sir Ronald A. Fisher spent about 185

More information

15. Supplementary Figure 9. Predicted gene module expression changes at 24hpi during HIV

15. Supplementary Figure 9. Predicted gene module expression changes at 24hpi during HIV Supplementary Information Table of content 1. Supplementary Table 1. Summary of RNAseq data and mapping statistics 2. Supplementary Table 2. Biological functions enriched in 12 hpi DE genes, derived from

More information

Immunopathology of Lymphoma

Immunopathology of Lymphoma Immunopathology of Lymphoma Noraidah Masir MBBCh, M.Med (Pathology), D.Phil. Department of Pathology Faculty of Medicine Universiti Kebangsaan Malaysia Lymphoma classification has been challenging to pathologists.

More information

The next lymphoma classification Luca Mazzucchelli Istituto cantonale di patologia, Locarno

The next lymphoma classification Luca Mazzucchelli Istituto cantonale di patologia, Locarno Evolution of classification The next classification Luca Mazzucchelli Istituto cantonale di patologia, Locarno The Lymphoma Forum of Excellence, Bellinzona, January 2011 Rappaport Lukes and Collins (immunophenotype)

More information

Intelligent Systems. Discriminative Learning. Parts marked by * are optional. WS2013/2014 Carsten Rother, Dmitrij Schlesinger

Intelligent Systems. Discriminative Learning. Parts marked by * are optional. WS2013/2014 Carsten Rother, Dmitrij Schlesinger Intelligent Systems Discriminative Learning Parts marked by * are optional 30/12/2013 WS2013/2014 Carsten Rother, Dmitrij Schlesinger Discriminative models There exists a joint probability distribution

More information

HTG EdgeSeq Immuno-Oncology Assay Gene List

HTG EdgeSeq Immuno-Oncology Assay Gene List A2M ABCB1 ABCB11 ABCC2 ABCG2 ABL1 ABL2 ACTB ADA ADAM17 ADGRE5 ADORA2A AICDA AKT3 ALCAM ALO5 ANA1 APAF1 APP ATF1 ATF2 ATG12 ATG16L1 ATG5 ATG7 ATM ATP5F1 AL BATF BA BCL10 BCL2 BCL2L1 BCL6 BID BIRC5 BLNK

More information

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics Mike West Duke University Papers, software, many links: www.isds.duke.edu/~mw ABS04 web site: Lecture slides, stats notes, papers,

More information

Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarkers detection and subgroup identification

Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarkers detection and subgroup identification Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarkers detection and subgroup identification Wei Zheng Sanofi-Aventis US Comprehend Info and Tech Talk outlines

More information

The epigenetic landscape of T cell subsets in SLE identifies known and potential novel drivers of the autoimmune response

The epigenetic landscape of T cell subsets in SLE identifies known and potential novel drivers of the autoimmune response Abstract # 319030 Poster # F.9 The epigenetic landscape of T cell subsets in SLE identifies known and potential novel drivers of the autoimmune response Jozsef Karman, Brian Johnston, Sofija Miljovska,

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 A.K.A. Artificial Intelligence Unsupervised learning! Cluster analysis Patterns, Clumps, and Joining

More information

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction

More information

Gene expression correlates of clinical prostate cancer behavior

Gene expression correlates of clinical prostate cancer behavior Gene expression correlates of clinical prostate cancer behavior Cancer Cell 2002 1: 203-209. Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, Tamayo P, Renshaw A, D Amico A, Richie J, Lander E, Loda

More information

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Data Mining Techniques to Predict Cancer Diseases

More information

Aggressive B-cell Lymphomas

Aggressive B-cell Lymphomas Neoplastic Hematopathology Update 2018 Aggressive B-cell Lymphomas Raju K. Pillai City of Hope National Medical Center I do not have any disclosures Disclosures Outline New entities and changes in WHO

More information

Introduction to Discrimination in Microarray Data Analysis

Introduction to Discrimination in Microarray Data Analysis Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t

More information

Statistical Considerations for Novel Trial Designs: Biomarkers, Umbrellas and Baskets

Statistical Considerations for Novel Trial Designs: Biomarkers, Umbrellas and Baskets Statistical Considerations for Novel Trial Designs: Biomarkers, Umbrellas and Baskets Bibhas Chakraborty, PhD Centre for Quantitative Medicine, Duke-NUS March 29, 2015 Personalized or Precision Medicine

More information

Master of Science Thesis. Alana Miranda Pinheiro

Master of Science Thesis. Alana Miranda Pinheiro Characterization of U2932 cell line subpopulations and evaluation of their sensibility to a chemotherapeutic drug. Master of Science Thesis Alana Miranda Pinheiro Medicine with Industrial Specialisation-

More information

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The parsimony principle: A quick review Find the tree that requires the fewest

More information

Design for Targeted Therapies: Statistical Considerations

Design for Targeted Therapies: Statistical Considerations Design for Targeted Therapies: Statistical Considerations J. Jack Lee, Ph.D. Department of Biostatistics University of Texas M. D. Anderson Cancer Center Outline Premise General Review of Statistical Designs

More information

Aggressive B-cell lymphomas and gene expression profiling towards individualized therapy?

Aggressive B-cell lymphomas and gene expression profiling towards individualized therapy? Aggressive B-cell lymphomas and gene expression profiling towards individualized therapy? Andreas Rosenwald Institute of Pathology, University of Würzburg, Germany Barcelona, June 18, 2010 NEW WHO CLASSIFICATION

More information

Aggressive B cell Lymphomas

Aggressive B cell Lymphomas Aggressive B cell Lymphomas I have nothing to disclose. Disclosures Raju K. Pillai City of Hope National Medical Center Outline WHO 2016 Classification Large B cell Lymphomas New entities and changes in

More information

SubLasso:a feature selection and classification R package with a. fixed feature subset

SubLasso:a feature selection and classification R package with a. fixed feature subset SubLasso:a feature selection and classification R package with a fixed feature subset Youxi Luo,3,*, Qinghan Meng,2,*, Ruiquan Ge,2, Guoqin Mai, Jikui Liu, Fengfeng Zhou,#. Shenzhen Institutes of Advanced

More information

Population Enrichment Designs Case Study of a Large Multinational Trial

Population Enrichment Designs Case Study of a Large Multinational Trial Population Enrichment Designs Case Study of a Large Multinational Trial Harvard Schering-Plough Workshop Boston, 29 May 2009 Cyrus R. Mehta, Ph.D Cytel Corporation and Harvard School of Public Health email:

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23: Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:7332-7341 Presented by Deming Mi 7/25/2006 Major reasons for few prognostic factors to

More information

Two-stage Methods to Implement and Analyze the Biomarker-guided Clinical Trail Designs in the Presence of Biomarker Misclassification

Two-stage Methods to Implement and Analyze the Biomarker-guided Clinical Trail Designs in the Presence of Biomarker Misclassification RESEARCH HIGHLIGHT Two-stage Methods to Implement and Analyze the Biomarker-guided Clinical Trail Designs in the Presence of Biomarker Misclassification Yong Zang 1, Beibei Guo 2 1 Department of Mathematical

More information

Visualizing Cancer Heterogeneity with Dynamic Flow

Visualizing Cancer Heterogeneity with Dynamic Flow Visualizing Cancer Heterogeneity with Dynamic Flow Teppei Nakano and Kazuki Ikeda Keio University School of Medicine, Tokyo 160-8582, Japan keiohigh2nd@gmail.com Department of Physics, Osaka University,

More information

Introduction to Computational Neuroscience

Introduction to Computational Neuroscience Introduction to Computational Neuroscience Lecture 5: Data analysis II Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis II 6 Single

More information

MODEL-BASED CLUSTERING IN GENE EXPRESSION MICROARRAYS: AN APPLICATION TO BREAST CANCER DATA

MODEL-BASED CLUSTERING IN GENE EXPRESSION MICROARRAYS: AN APPLICATION TO BREAST CANCER DATA International Journal of Software Engineering and Knowledge Engineering Vol. 13, No. 6 (2003) 579 592 c World Scientific Publishing Company MODEL-BASED CLUSTERING IN GENE EXPRESSION MICROARRAYS: AN APPLICATION

More information

Gene-microRNA network module analysis for ovarian cancer

Gene-microRNA network module analysis for ovarian cancer Gene-microRNA network module analysis for ovarian cancer Shuqin Zhang School of Mathematical Sciences Fudan University Oct. 4, 2016 Outline Introduction Materials and Methods Results Conclusions Introduction

More information

On the Targets of Latent Variable Model Estimation

On the Targets of Latent Variable Model Estimation On the Targets of Latent Variable Model Estimation Karen Bandeen-Roche Department of Biostatistics Johns Hopkins University Department of Mathematics and Statistics Miami University December 8, 2005 With

More information

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Overview Babak Shahbaba Department of Statistics, UCI The role of statistical analysis in science This course discusses some biostatistical methods, which involve

More information

Downloaded from:

Downloaded from: Hemingway, H; Croft, P; Perel, P; Hayden, JA; Abrams, K; Timmis, A; Briggs, A; Udumyan, R; Moons, KG; Steyerberg, EW; Roberts, I; Schroter, S; Altman, DG; Riley, RD; PROGRESS Group (2013) Prognosis research

More information

Diagnostic Molecular Pathology of Lymphoid Neoplasms

Diagnostic Molecular Pathology of Lymphoid Neoplasms Diagnostic Molecular Pathology of Lymphoid Neoplasms (Part II) Rational use of molecular testing in lymphomas Beirut, Lebanon Friday December 2, 2011: Hematopathology Session Adam Bagg University of Pennsylvania

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

Low-Grade B-Cell Lymphomas in WHO Classification. Follicular Lymphoma Definition. Follicular Lymphoma Clinical Features 11/7/2017 DISCLOSURES

Low-Grade B-Cell Lymphomas in WHO Classification. Follicular Lymphoma Definition. Follicular Lymphoma Clinical Features 11/7/2017 DISCLOSURES Low-Grade B-Cell Lymphomas in WHO Classification DISCLOSURES I do not have anything to disclose Lymphoma Type Frequency Follicular lymphoma 22.1 % Extranodal MALT-lymphoma 7.6 % Small lymphocytic lymphoma/cll

More information

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Florian Markowetz and Anja von Heydebreck Max-Planck-Institute for Molecular Genetics Computational Molecular Biology

More information

PKPD modelling to optimize dose-escalation trials in Oncology

PKPD modelling to optimize dose-escalation trials in Oncology PKPD modelling to optimize dose-escalation trials in Oncology Marina Savelieva Design of Experiments in Healthcare, Issac Newton Institute for Mathematical Sciences Aug 19th, 2011 Outline Motivation Phase

More information

Machine learning and big data for prognosis and prediction in aggressive lymphomas

Machine learning and big data for prognosis and prediction in aggressive lymphomas School of something School of Molecular and Cellular Biology FACULTY OF OTHER BIOLOGICAL SCIENCES Machine learning and big data for prognosis and prediction in aggressive lymphomas David R. Westhead With

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature15260 Supplementary Data 1: Gene expression in individual basal/stem, luminal, and luminal progenitor cells. Box plots show expression levels for each gene from the 49-gene differentiation

More information

Assessment of omicsbased predictor readiness for use in a clinical trial

Assessment of omicsbased predictor readiness for use in a clinical trial Assessment of omicsbased predictor readiness for use in a clinical trial Lisa Meier McShane Biometric Research Branch Division of Cancer Treatment & Diagnosis U.S. National Cancer Institute Biopharmaceutical

More information

Having your cake and eating it too: multiple dimensions and a composite

Having your cake and eating it too: multiple dimensions and a composite Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018 outline Motivating example Different modeling approaches Composite

More information

SUPPLEMENTARY APPENDIX

SUPPLEMENTARY APPENDIX SUPPLEMENTARY APPENDIX 1) Supplemental Figure 1. Histopathologic Characteristics of the Tumors in the Discovery Cohort 2) Supplemental Figure 2. Incorporation of Normal Epidermal Melanocytic Signature

More information

What can we contribute to cancer research and treatment from Computer Science or Mathematics? How do we adapt our expertise for them

What can we contribute to cancer research and treatment from Computer Science or Mathematics? How do we adapt our expertise for them From Bioinformatics to Health Information Technology Outline What can we contribute to cancer research and treatment from Computer Science or Mathematics? How do we adapt our expertise for them Introduction

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017 RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science

More information

Vision as Bayesian inference: analysis by synthesis?

Vision as Bayesian inference: analysis by synthesis? Vision as Bayesian inference: analysis by synthesis? Schwarz Andreas, Wiesner Thomas 1 / 70 Outline Introduction Motivation Problem Description Bayesian Formulation Generative Models Letters, Text Faces

More information

Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes

Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Ivan Arreola and Dr. David Han Department of Management of Science and Statistics, University

More information

Reliability of Ordination Analyses

Reliability of Ordination Analyses Reliability of Ordination Analyses Objectives: Discuss Reliability Define Consistency and Accuracy Discuss Validation Methods Opening Thoughts Inference Space: What is it? Inference space can be defined

More information

Graphical Modeling Approaches for Estimating Brain Networks

Graphical Modeling Approaches for Estimating Brain Networks Graphical Modeling Approaches for Estimating Brain Networks BIOS 516 Suprateek Kundu Department of Biostatistics Emory University. September 28, 2017 Introduction My research focuses on understanding how

More information

Bayesian (Belief) Network Models,

Bayesian (Belief) Network Models, Bayesian (Belief) Network Models, 2/10/03 & 2/12/03 Outline of This Lecture 1. Overview of the model 2. Bayes Probability and Rules of Inference Conditional Probabilities Priors and posteriors Joint distributions

More information

L. Ziaei MS*, A. R. Mehri PhD**, M. Salehi PhD***

L. Ziaei MS*, A. R. Mehri PhD**, M. Salehi PhD*** Received: 1/16/2004 Accepted: 8/1/2005 Original Article Application of Artificial Neural Networks in Cancer Classification and Diagnosis Prediction of a Subtype of Lymphoma Based on Gene Expression Profile

More information

Biopharma and HTG Molecular Diagnostics Sample Sensitivity Data

Biopharma and HTG Molecular Diagnostics Sample Sensitivity Data Biopharma and HTG Molecular Diagnostics Sample Sensitivity Data Accelerating your success. Working in harmony with next-generation sequencing (NGS) platforms, HTG s patented chemistry, multiplexed assays

More information

Learning from data when all models are wrong

Learning from data when all models are wrong Learning from data when all models are wrong Peter Grünwald CWI / Leiden Menu Two Pictures 1. Introduction 2. Learning when Models are Seriously Wrong Joint work with John Langford, Tim van Erven, Steven

More information

Predicting Kidney Cancer Survival from Genomic Data

Predicting Kidney Cancer Survival from Genomic Data Predicting Kidney Cancer Survival from Genomic Data Christopher Sauer, Rishi Bedi, Duc Nguyen, Benedikt Bünz Abstract Cancers are on par with heart disease as the leading cause for mortality in the United

More information

Big Data and Machine Learning in RCTs An overview

Big Data and Machine Learning in RCTs An overview Big Data and Machine Learning in RCTs An overview Dario Gregori Unit of Biostatistics, Epidemiology and Public Health Department of Cardiac, Thoracic and Vascular Sciences University of Padova dario.gregori@unipd.it

More information

Improving conventional prognosticators in diffuse large B cell lymphoma using marker ratios

Improving conventional prognosticators in diffuse large B cell lymphoma using marker ratios Improving conventional prognosticators in diffuse large B cell lymphoma using marker ratios Kim-Anh LÊ CAO NHMRC Career Development Fellow, Statistician The University of Queensland Diamantina Institute

More information

Expression of NOTCH3 exon 16 differentiates Diffuse Large B-cell Lymphoma into molecular subtypes and is associated with prognosis

Expression of NOTCH3 exon 16 differentiates Diffuse Large B-cell Lymphoma into molecular subtypes and is associated with prognosis Aalborg Universitet Expression of NOTCH3 exon 16 differentiates Diffuse Large B-cell Lymphoma into molecular subtypes and is associated with prognosis Jespersen, Ditte Starberg; Schönherz, Anna A; Due,

More information

CLASSIFICATION OF BREAST CANCER INTO BENIGN AND MALIGNANT USING SUPPORT VECTOR MACHINES

CLASSIFICATION OF BREAST CANCER INTO BENIGN AND MALIGNANT USING SUPPORT VECTOR MACHINES CLASSIFICATION OF BREAST CANCER INTO BENIGN AND MALIGNANT USING SUPPORT VECTOR MACHINES K.S.NS. Gopala Krishna 1, B.L.S. Suraj 2, M. Trupthi 3 1,2 Student, 3 Assistant Professor, Department of Information

More information

High grade B-cell lymphomas (HGBL): Altered terminology in the 2016 WHO Classification (Update of the 4 th Edition) and practical issues Xiao-Qiu Li,

High grade B-cell lymphomas (HGBL): Altered terminology in the 2016 WHO Classification (Update of the 4 th Edition) and practical issues Xiao-Qiu Li, High grade B-cell lymphomas (HGBL): Altered terminology in the 2016 WHO Classification (Update of the 4 th Edition) and practical issues Xiao-Qiu Li, M.D., Ph.D. Fudan University Shanghai Cancer Center

More information

The generation of antibody-secreting plasma cells

The generation of antibody-secreting plasma cells REVIEWS The generation of antibody-secreting plasma cells Stephen L. Nutt, Philip D. Hodgkin, David M. Tarlinton and Lynn M. Corcoran Abstract The regulation of antibody production is linked to the generation

More information

The Roles of Short Term Endpoints in. Clinical Trial Planning and Design

The Roles of Short Term Endpoints in. Clinical Trial Planning and Design The Roles of Short Term Endpoints in Clinical Trial Planning and Design Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Roche, Welwyn Garden

More information

Recognizing Scenes by Simulating Implied Social Interaction Networks

Recognizing Scenes by Simulating Implied Social Interaction Networks Recognizing Scenes by Simulating Implied Social Interaction Networks MaryAnne Fields and Craig Lennon Army Research Laboratory, Aberdeen, MD, USA Christian Lebiere and Michael Martin Carnegie Mellon University,

More information

Detection and Classification of Lung Cancer Using Artificial Neural Network

Detection and Classification of Lung Cancer Using Artificial Neural Network Detection and Classification of Lung Cancer Using Artificial Neural Network Almas Pathan 1, Bairu.K.saptalkar 2 1,2 Department of Electronics and Communication Engineering, SDMCET, Dharwad, India 1 almaseng@yahoo.co.in,

More information

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE Dr Richard Emsley Centre for Biostatistics, Institute of Population

More information

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

Summary of main challenges and future directions

Summary of main challenges and future directions Summary of main challenges and future directions Martin Schumacher Institute of Medical Biometry and Medical Informatics, University Medical Center, Freiburg Workshop October 2008 - F1 Outline Some historical

More information

9/28/2017. Follicular Lymphoma and Nodal Marginal Zone Lymphoma. Follicular Lymphoma Definition. Low-Grade B-Cell Lymphomas in WHO Classification

9/28/2017. Follicular Lymphoma and Nodal Marginal Zone Lymphoma. Follicular Lymphoma Definition. Low-Grade B-Cell Lymphomas in WHO Classification and L. Jeffrey Medeiros, MD DISCLOSURES I do not have anything to disclose Low-Grade B-Cell Lymphomas in WHO Classification Lymphoma Type Frequency Follicular lymphoma 22.1 % Extranodal MALT-lymphoma 7.6

More information

A genome-wide association study identifies vitiligo

A genome-wide association study identifies vitiligo A genome-wide association study identifies vitiligo susceptibility loci at MHC and 6q27 Supplementary Materials Index Supplementary Figure 1 The principal components analysis (PCA) of 2,546 GWAS samples

More information

10CS664: PATTERN RECOGNITION QUESTION BANK

10CS664: PATTERN RECOGNITION QUESTION BANK 10CS664: PATTERN RECOGNITION QUESTION BANK Assignments would be handed out in class as well as posted on the class blog for the course. Please solve the problems in the exercises of the prescribed text

More information

Predicting clinical outcomes in neuroblastoma with genomic data integration

Predicting clinical outcomes in neuroblastoma with genomic data integration Predicting clinical outcomes in neuroblastoma with genomic data integration Ilyes Baali, 1 Alp Emre Acar 1, Tunde Aderinwale 2, Saber HafezQorani 3, Hilal Kazan 4 1 Department of Electric-Electronics Engineering,

More information

Basket and Umbrella Trial Designs in Oncology

Basket and Umbrella Trial Designs in Oncology Basket and Umbrella Trial Designs in Oncology Eric Polley Biomedical Statistics and Informatics Mayo Clinic Polley.Eric@mayo.edu Dose Selection for Cancer Treatment Drugs Stanford Medicine May 2017 1 /

More information

EXPression ANalyzer and DisplayER

EXPression ANalyzer and DisplayER EXPression ANalyzer and DisplayER Tom Hait Aviv Steiner Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Adi Maron-Katz Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh

More information

IX. Is it only about MYC? How to approach the diagnosis of diffuse large B-cell lymphomas

IX. Is it only about MYC? How to approach the diagnosis of diffuse large B-cell lymphomas Hematological Oncology Hematol Oncol 2015; 33: 50 55 Published online in Wiley Online Library (wileyonlinelibrary.com).2217 Supplement Article IX. Is it only about MYC? How to approach the diagnosis of

More information

Lecture 13: Finding optimal treatment policies

Lecture 13: Finding optimal treatment policies MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 13: Finding optimal treatment policies Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Peter Bodik for slides on reinforcement learning) Outline

More information

For general queries, contact

For general queries, contact Much of the work in Bayesian econometrics has focused on showing the value of Bayesian methods for parametric models (see, for example, Geweke (2005), Koop (2003), Li and Tobias (2011), and Rossi, Allenby,

More information

Aggressive B-Cell Lymphomas

Aggressive B-Cell Lymphomas Aggressive B-cell Lymphomas Aggressive B-Cell Lymphomas Stephen Hamilton Dutoit Institute of Pathology Aarhus Kommunehospital B-lymphoblastic lymphoma Diffuse large cell lymphoma, NOS T-cell / histiocyte-rich;

More information

K MEAN AND FUZZY CLUSTERING ALGORITHM PREDICATED BRAIN TUMOR SEGMENTATION AND AREA ESTIMATION

K MEAN AND FUZZY CLUSTERING ALGORITHM PREDICATED BRAIN TUMOR SEGMENTATION AND AREA ESTIMATION K MEAN AND FUZZY CLUSTERING ALGORITHM PREDICATED BRAIN TUMOR SEGMENTATION AND AREA ESTIMATION Yashwanti Sahu 1, Suresh Gawande 2 1 M.Tech. Scholar, Electronics & Communication Engineering, BERI Bhopal,

More information

BREAST CANCER EPIDEMIOLOGY MODEL:

BREAST CANCER EPIDEMIOLOGY MODEL: BREAST CANCER EPIDEMIOLOGY MODEL: Calibrating Simulations via Optimization Michael C. Ferris, Geng Deng, Dennis G. Fryback, Vipat Kuruchittham University of Wisconsin 1 University of Wisconsin Breast Cancer

More information

Sign Language Recognition using Webcams

Sign Language Recognition using Webcams Sign Language Recognition using Webcams Overview Average person s typing speed Composing: ~19 words per minute Transcribing: ~33 words per minute Sign speaker Full sign language: ~200 words per minute

More information

OVARIAN CANCER POSSIBLE TUMOR-SUPPRESSIVE ROLE OF BATF2 IN OVARIAN CANCER

OVARIAN CANCER POSSIBLE TUMOR-SUPPRESSIVE ROLE OF BATF2 IN OVARIAN CANCER POSSIBLE TUMOR-SUPPRESSIVE ROLE OF BATF2 IN OVARIAN CANCER Rissa Fedora, OMS-III OVARIAN CANCER 5 th leading cause of death among American women Risk factors: Age Gender female Greater lifetime ovulations

More information

Low-grade B-cell lymphoma

Low-grade B-cell lymphoma Low-grade B-cell lymphoma Patho-Basic 11. September 2018 Stephan Dirnhofer Pathology Outline Definition LPL, MBL/CLL/SLL, MCL FL Subtypes & variants Diagnosis including Grading Transformation Summary Be

More information

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory Computational aspects of ChIP-seq John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory ChIP-seq Using highthroughput sequencing to investigate DNA

More information