Colon cancer subtypes from gene expression data
|
|
- Dortha Barrett
- 5 years ago
- Views:
Transcription
1 Colon cancer subtypes from gene expression data Nathan Cunningham Giuseppe Di Benedetto Sherman Ip Leon Law Module 6: Applied Statistics 26th February 2016
2 Aim Replicate findings of Felipe De Sousa et al. (2013) Cluster analysis to identify subtypes of colon cancer Construct a classifier to identify clusters Identify a suitable subset of the data to perform these analyses Consider robustness of findings to changes in methods and perturbations in the data
3 Data GSE33113 data set (Academic Medical Centre, Amsterdam) Patients with stage II colon cancer 90 patients, 54, 675 gene expressions recorded
4 Data processing Normalisation to remove batch effects Gene expression presence detected using barcode algorithm and those not present in at least one sample removed Genes with a median absolute deviation > 0.5 retained and median centred Felipe De Sousa et al. (2013) find 7, 846 genes remain we find anywhere from none to all of the genes remain Use 146 genes identified by Felipe De Sousa et al. (2013) in analyses
5 Cluster Analysis Hierarchical agglomerative, average linkage K-Means Consensus Model-based (Fraley & Raftery, 2002)
6 How many clusters?
7 Clustering methods comparison Homogeneity: reflects compactness of the clusters Separation: reflects the distance between clusters Silouette: s(i) = b(i) a(i) max{a(i),(b(i)}
8 Clustering methods comparison Homogeneity: reflects compactness of the clusters Separation: reflects the distance between clusters Silouette: s(i) = b(i) a(i) max{a(i),(b(i)} WADP (weighted avarage discrepancy pairs)
9 Robustness under perturbation 0.20 value variable cons_kmeans mclust cons_hierclust sd
10 Cluster methods comparison Cluster comparison WADP value C-k-means VS C-hierarchical MClust VS C-hierarchical C-k-means VS Mclust 0.081
11 Classification: PAM R package for implementing nearest shrunken centroid classification.
12 Classification: PAM R package for implementing nearest shrunken centroid classification. Gives higher weights to genes in a class that are far away from the overall centroid of the genes.
13 Classification: PAM R package for implementing nearest shrunken centroid classification. Gives higher weights to genes in a class that are far away from the overall centroid of the genes. A threshold parameter specifies a shrinkage for the weights giving higher weights to genes which are stable within the class.
14 Classification: PAM R package for implementing nearest shrunken centroid classification. Gives higher weights to genes in a class that are far away from the overall centroid of the genes. A threshold parameter specifies a shrinkage for the weights giving higher weights to genes which are stable within the class. Can eliminate the weaker effect of genes, allowing automatic feature selection.
15 Classification: PAM R package for implementing nearest shrunken centroid classification. Gives higher weights to genes in a class that are far away from the overall centroid of the genes. A threshold parameter specifies a shrinkage for the weights giving higher weights to genes which are stable within the class. Can eliminate the weaker effect of genes, allowing automatic feature selection. Classification by considering the smallest distance to the shrunken centroid.
16 Classification: Multi-Class SVM The R package e1071 was used to perform the multi-class SVM with a RBF kernel.
17 Classification: Multi-Class SVM The R package e1071 was used to perform the multi-class SVM with a RBF kernel. Uses a one vs one approach (i.e. 3 binary classifiers) with class prediction done by a voting scheme.
18 Classification: Multi-Class SVM The R package e1071 was used to perform the multi-class SVM with a RBF kernel. Uses a one vs one approach (i.e. 3 binary classifiers) with class prediction done by a voting scheme. If a linear kernel was used instead, could perform feature selection based on ranking of the features using their weights,
19 Classification: Random Forest The R package randomforest was used to train a random forest.
20 Classification: Random Forest The R package randomforest was used to train a random forest. A total of 300 trees were built, with 12 variables randomly chosen as candidates at each split.
21 Classification: Random Forest The R package randomforest was used to train a random forest. A total of 300 trees were built, with 12 variables randomly chosen as candidates at each split. Feature selection can be done using mean decrease accuracy, which uses permutation of the features and out of bag error.
22 Results: PAM Number of genes Misclassification Error x x x x x x x x x x x x x x x x x x x Value of threshold Misclassification Error Label 1 Label 2 Label Value of threshold Figure: 10-fold cross validation error. Optimal threshold was estimated to be 6.2 ± 0.2.
23 Results: SVM and Random Forest and PAM Method 10-Fold Cross Validation Error SVM (C = 1, γ = ) 1.1% PAM (threshold = 6.2) 2.2% Random Forest 3.3% Table: 10-fold cross validation average error on the trained classifiers
24 Results: SVM and Random Forest and PAM Method 10-Fold Cross Validation Error SVM (C = 1, γ = ) 1.1% PAM (threshold = 6.2) 2.2% Random Forest 3.3% Table: 10-fold cross validation average error on the trained classifiers Error bars can be estimated using bootstrapping.
25 Results: PAM Bootstrapping Validation Error (%) Threshold (unknown units) Figure: Median (point) and 95% percentile (error bar) of the 10-fold cross validation error, bootstrapping 500 times.
26 Results: PAM Bootstrapping Number of genes which survived thresholding (genes) Threshold (unknown units) Figure: Mean (point) and standard deviation (error bar) of the number of genes which survived thresholding, bootstrapping 500 times.
27 Results: PAM Bootstrapping Method PAM (threshold = 0.0) PAM (threshold = 6.2) SVM Random Forest 10-Fold Cross ( Validation Error ( 1.1) % ( 2.2) % ( 0.0) % ) % Table: Median and 95% percentile of the 10-fold cross validation error, bootstrapping 500 times. For PAM with threshold 6.2, (36.5 ± 6.3) genes survived thresholding.
28 Conclusion Clustering methods were robust PAM performed similar to other methods More thresholds to be investigated Scale to larger datasets
29 References Felipe De Sousa, E. M., Wang, X., Jansen, M., Fessler, E., Trinh, A., de Rooij, L. P.,... others (2013). Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions. Nature medicine, 19(5), Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American statistical Association, 97(458),
Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.
Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension
More informationClassification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang
Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression
More informationPredicting Breast Cancer Survival Using Treatment and Patient Factors
Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women
More informationBiomarker adaptive designs in clinical trials
Review Article Biomarker adaptive designs in clinical trials James J. Chen 1, Tzu-Pin Lu 1,2, Dung-Tsa Chen 3, Sue-Jane Wang 4 1 Division of Bioinformatics and Biostatistics, National Center for Toxicological
More informationApplying One-vs-One and One-vs-All Classifiers in k-nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem
Oral Presentation at MIE 2011 30th August 2011 Oslo Applying One-vs-One and One-vs-All Classifiers in k-nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem Kirsi
More informationAn automatic mammogram system: from screening to diagnosis. Inês Domingues
An automatic mammogram system: from screening to diagnosis Inês Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline Outline Outline Outline Outline Outline Outline Outline Outline
More informationA NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION
OPT-i An International Conference on Engineering and Applied Sciences Optimization M. Papadrakakis, M.G. Karlaftis, N.D. Lagaros (eds.) Kos Island, Greece, 4-6 June 2014 A NOVEL VARIABLE SELECTION METHOD
More informationNature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.
Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze
More informationNearest Shrunken Centroid as Feature Selection of Microarray Data
Nearest Shrunken Centroid as Feature Selection of Microarray Data Myungsook Klassen Computer Science Department, California Lutheran University 60 West Olsen Rd, Thousand Oaks, CA 91360 mklassen@clunet.edu
More informationTITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)
TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition
More informationEMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS?
EMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS? Sefik Emre Eskimez, Kenneth Imade, Na Yang, Melissa Sturge- Apple, Zhiyao Duan, Wendi Heinzelman University of Rochester,
More informationBootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers
Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers Kai-Ming Jiang 1,2, Bao-Liang Lu 1,2, and Lei Xu 1,2,3(&) 1 Department of Computer Science and Engineering,
More informationComparison of discrimination methods for the classification of tumors using gene expression data
Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley
More informationGene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering
Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene
More informationNature Medicine: doi: /nm.3967
Supplementary Figure 1. Network clustering. (a) Clustering performance as a function of inflation factor. The grey curve shows the median weighted Silhouette widths for varying inflation factors (f [1.6,
More informationRoadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:
Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:7332-7341 Presented by Deming Mi 7/25/2006 Major reasons for few prognostic factors to
More informationResponse to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008
Journal of Machine Learning Research 9 (2008) 59-64 Published 1/08 Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008 Jerome Friedman Trevor Hastie Robert
More informationGene Selection for Tumor Classification Using Microarray Gene Expression Data
Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology
More informationExpert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland
Expert-guided Visual Exploration (EVE) for patient stratification Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland Oncoscape.sttrcancer.org Paul Lisa Ken Jenny Desert Eric The challenge Given - patient clinical
More informationHybridized KNN and SVM for gene expression data classification
Mei, et al, Hybridized KNN and SVM for gene expression data classification Hybridized KNN and SVM for gene expression data classification Zhen Mei, Qi Shen *, Baoxian Ye Chemistry Department, Zhengzhou
More informationNational Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2008 Formula Grant
National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2008 Formula Grant Reporting Period July 1, 2011 December 31, 2011 Formula Grant Overview The National Surgical
More informationApplication of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties
Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point
More informationPredicting clinical outcomes in neuroblastoma with genomic data integration
Predicting clinical outcomes in neuroblastoma with genomic data integration Ilyes Baali, 1 Alp Emre Acar 1, Tunde Aderinwale 2, Saber HafezQorani 3, Hilal Kazan 4 1 Department of Electric-Electronics Engineering,
More informationRecognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks
Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks Caio R. Souza 1, Flavio F. Nobre 1, Priscila V.M. Lima 2, Robson M. Silva 2, Rodrigo M. Brindeiro 3, Felipe
More informationClass discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines
Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Florian Markowetz and Anja von Heydebreck Max-Planck-Institute for Molecular Genetics Computational Molecular Biology
More informationWeight Adjustment Methods using Multilevel Propensity Models and Random Forests
Weight Adjustment Methods using Multilevel Propensity Models and Random Forests Ronaldo Iachan 1, Maria Prosviryakova 1, Kurt Peters 2, Lauren Restivo 1 1 ICF International, 530 Gaither Road Suite 500,
More informationEvaluating Classifiers for Disease Gene Discovery
Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics
More informationPredictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts
Predictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts Kinar, Y., Kalkstein, N., Akiva, P., Levin, B., Half, E.E., Goldshtein, I., Chodick, G. and Shalev,
More informationIntroduction to Discrimination in Microarray Data Analysis
Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t
More informationClustering Autism Cases on Social Functioning
Clustering Autism Cases on Social Functioning Nelson Ray and Praveen Bommannavar 1 Introduction Autism is a highly heterogeneous disorder with wide variability in social functioning. Many diagnostic and
More informationDiagnosis of multiple cancer types by shrunken centroids of gene expression
Diagnosis of multiple cancer types by shrunken centroids of gene expression Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu PNAS 99:10:6567-6572, 14 May 2002 Nearest Centroid
More informationJournal: Nature Methods
Journal: Nature Methods Article Title: Network-based stratification of tumor mutations Corresponding Author: Trey Ideker Supplementary Item Supplementary Figure 1 Supplementary Figure 2 Supplementary Figure
More informationSNPrints: Defining SNP signatures for prediction of onset in complex diseases
SNPrints: Defining SNP signatures for prediction of onset in complex diseases Linda Liu, Biomedical Informatics, Stanford University Daniel Newburger, Biomedical Informatics, Stanford University Grace
More informationThe Role of Face Parts in Gender Recognition
The Role of Face Parts in Gender Recognition Yasmina Andreu Ramón A. Mollineda Pattern Analysis and Learning Section Computer Vision Group University Jaume I of Castellón (Spain) Y. Andreu, R.A. Mollineda
More informationIdentifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang
Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang Abstract: Unlike most cancers, thyroid cancer has an everincreasing incidence rate
More informationData Mining in Bioinformatics Day 7: Clustering in Bioinformatics
Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 21 to March 4, 2011 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt:
More informationRISK PREDICTION MODEL: PENALIZED REGRESSIONS
RISK PREDICTION MODEL: PENALIZED REGRESSIONS Inspired from: How to develop a more accurate risk prediction model when there are few events Menelaos Pavlou, Gareth Ambler, Shaun R Seaman, Oliver Guttmann,
More informationUvA-DARE (Digital Academic Repository)
UvA-DARE (Digital Academic Repository) A classification model for the Leiden proteomics competition Hoefsloot, H.C.J.; Berkenbos-Smit, S.; Smilde, A.K. Published in: Statistical Applications in Genetics
More informationDNA methylation signatures for 2016 WHO classification subtypes of diffuse gliomas
Paul et al. Clinical Epigenetics (2017) 9:32 DOI 10.1186/s13148-017-0331-9 RESEARCH Open Access DNA methylation signatures for 2016 WHO classification subtypes of diffuse gliomas Yashna Paul, Baisakhi
More informationCoINcIDE: A framework for discovery of patient subtypes across multiple datasets
Planey and Gevaert Genome Medicine (2016) 8:27 DOI 10.1186/s13073-016-0281-4 METHOD CoINcIDE: A framework for discovery of patient subtypes across multiple datasets Catherine R. Planey and Olivier Gevaert
More informationIdentifikation von Risikofaktoren in der koronaren Herzchirurgie
Identifikation von Risikofaktoren in der koronaren Herzchirurgie Julia Schiffner 1 Erhard Godehardt 2 Stefanie Hillebrand 1 Alexander Albert 2 Artur Lichtenberg 2 Claus Weihs 1 1 Fakultät Statistik, Technische
More informationBrain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Brain Tumour Detection of MR Image Using Naïve
More informationClassification of cancer profiles. ABDBM Ron Shamir
Classification of cancer profiles 1 Background: Cancer Classification Cancer classification is central to cancer treatment; Traditional cancer classification methods: location; morphology, cytogenesis;
More informationPackage citccmst. February 19, 2015
Version 1.0.2 Date 2014-01-07 Package citccmst February 19, 2015 Title CIT Colon Cancer Molecular SubTypes Prediction Description This package implements the approach to assign tumor gene expression dataset
More informationIntegrative analysis of survival-associated gene sets in breast cancer
Varn et al. BMC Medical Genomics (2015) 8:11 DOI 10.1186/s12920-015-0086-0 RESEARCH ARTICLE Open Access Integrative analysis of survival-associated gene sets in breast cancer Frederick S Varn 1, Matthew
More informationPredicting Cancer Drug Response by Proteomic Profiling
Predicting Cancer Drug Response by Proteomic Profiling Yan Ma, 1 Zhenyu Ding, 2 Yong Qian, 4 Xianglin Shi, 4 Vince Castranova, 4 E. James Harner, 1 and Lan Guo 3 Abstract Purpose: Accurate prediction of
More informationPredicting Non-Small Cell Lung Cancer Diagnosis and Prognosis by Fully Automated Microscopic Pathology Image Features
Predicting Non-Small Cell Lung Cancer Diagnosis and Prognosis by Fully Automated Microscopic Pathology Image Features Kun-Hsing Yu, MD, PhD Department of Biomedical Informatics, Harvard Medical School
More informationThe Long Tail of Recommender Systems and How to Leverage It
The Long Tail of Recommender Systems and How to Leverage It Yoon-Joo Park Stern School of Business, New York University ypark@stern.nyu.edu Alexander Tuzhilin Stern School of Business, New York University
More informationPatient characteristics of training and validation set. Patient selection and inclusion overview can be found in Supp Data 9. Training set (103)
Roepman P, et al. An immune response enriched 72-gene prognostic profile for early stage Non-Small- Supplementary Data 1. Patient characteristics of training and validation set. Patient selection and inclusion
More informationPredictive Diagnosis. Clustering to Better Predict Heart Attacks x The Analytics Edge
Predictive Diagnosis Clustering to Better Predict Heart Attacks 15.071x The Analytics Edge Heart Attacks Heart attack is a common complication of coronary heart disease resulting from the interruption
More informationClassification of Patients Treated for Infertility Using the IVF Method
STUDIES IN LOGIC, GRAMMAR AND RHETORIC 43(56) 2015 DOI: 10.1515/slgr-2015-0041 Classification of Patients Treated for Infertility Using the IVF Method PawełMalinowski 1,RobertMilewski 1,PiotrZiniewicz
More informationOutlier detection in datasets with mixed-attributes
Vrije Universiteit Amsterdam Thesis Outlier detection in datasets with mixed-attributes Author: Milou Meltzer Supervisor: Johan ten Houten Evert Haasdijk A thesis submitted in fulfilment of the requirements
More informationAn empirical evaluation of text classification and feature selection methods
ORIGINAL RESEARCH An empirical evaluation of text classification and feature selection methods Muazzam Ahmed Siddiqui Department of Information Systems, Faculty of Computing and Information Technology,
More informationKnowledge Discovery and Data Mining I
Ludwig-Maximilians-Universität München Lehrstuhl für Datenbanksysteme und Data Mining Prof. Dr. Thomas Seidl Knowledge Discovery and Data Mining I Winter Semester 2018/19 Introduction What is an outlier?
More informationWhat variables are important in predicting bovine viral diarrhea virus? A random forest approach
Machado et al. Veterinary Research (2015) 46:85 DOI 10.1186/s13567-015-0219-7 VETERINARY RESEARCH RESEARCH ARTICLE What variables are important in predicting bovine viral diarrhea virus? A random forest
More informationDetection of Neuromuscular Diseases Using Surface Electromyograms
Faculty of Electrical Engineering and Computer Science University of Maribor 1 Department of Computer Science, University of Cyprus 2 The Cyprus Institute of Neurology and Genetics 3 Detection of Neuromuscular
More informationBias Adjustment: Local Control Analysis of Radon and Ozone
Bias Adjustment: Local Control Analysis of Radon and Ozone S. Stanley Young Robert Obenchain Goran Krstic NCSU 19Oct2016 Abstract Bias Adjustment: Local control analysis of Radon and ozone S. Stanley Young,
More informationList of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition
List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing
More informationAutomated Tessellated Fundus Detection in Color Fundus Images
University of Iowa Iowa Research Online Proceedings of the Ophthalmic Medical Image Analysis International Workshop 2016 Proceedings Oct 21st, 2016 Automated Tessellated Fundus Detection in Color Fundus
More informationSurvival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer
Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer Jayashree Kalpathy-Cramer PhD 1, William Hersh, MD 1, Jong Song Kim, PhD
More informationHandwriting - marker for Parkinson s Disease
Handwriting - marker for Parkinson s Disease P. Drotár et al. Signal Processing Lab Department of Telecommunications Brno University of Technology 3rd SPLab Workshop, 2013 P. Drotár et al. (Brno University
More informationSupplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE
Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE Peng Qiu1,4, Erin F. Simonds2, Sean C. Bendall2, Kenneth D. Gibbs Jr.2, Robert V. Bruggner2, Michael
More informationAutomated Estimation of mts Score in Hand Joint X-Ray Image Using Machine Learning
Automated Estimation of mts Score in Hand Joint X-Ray Image Using Machine Learning Shweta Khairnar, Sharvari Khairnar 1 Graduate student, Pace University, New York, United States 2 Student, Computer Engineering,
More informationAssigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science
Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science Abstract One method for analyzing pediatric B cell leukemia is to categorize
More informationGene expression correlates of clinical prostate cancer behavior
Gene expression correlates of clinical prostate cancer behavior Cancer Cell 2002 1: 203-209. Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, Tamayo P, Renshaw A, D Amico A, Richie J, Lander E, Loda
More informationInternational Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT
Research Article Bioinformatics International Journal of Pharma and Bio Sciences ISSN 0975-6299 A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS D.UDHAYAKUMARAPANDIAN
More informationABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Data Mining Techniques to Predict Cancer Diseases
More informationBIOSTATISTICAL METHODS
BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH PROPENSITY SCORE Confounding Definition: A situation in which the effect or association between an exposure (a predictor or risk factor) and
More informationSupporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction
Supporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction Amir Barati Farimani, Mohammad Heiranian, Narayana R. Aluru 1 Department
More informationNational Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2009 Formula Grant
National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2009 Formula Grant Reporting Period July 1, 2011 June 30, 2012 Formula Grant Overview The National Surgical
More informationNetwork-based biomarkers enhance classical approaches to prognostic gene expression signatures
RESEARCH Open Access Network-based biomarkers enhance classical approaches to prognostic gene expression signatures Rebecca L Barter 1, Sarah-Jane Schramm 2,3, Graham J Mann 2,3, Yee Hwa Yang 1,3* From
More informationChapter 1. Introduction
Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a
More informationRandom forest analysis in vaccine manufacturing. Matt Wiener Dept. of Applied Computer Science & Mathematics Merck & Co.
Random forest analysis in vaccine manufacturing Matt Wiener Dept. of Applied Computer Science & Mathematics Merck & Co. Acknowledgements Many people from many departments The problem Vaccines, once discovered,
More informationBREAST CANCER EPIDEMIOLOGY MODEL:
BREAST CANCER EPIDEMIOLOGY MODEL: Calibrating Simulations via Optimization Michael C. Ferris, Geng Deng, Dennis G. Fryback, Vipat Kuruchittham University of Wisconsin 1 University of Wisconsin Breast Cancer
More informationTissue Classification Based on Gene Expression Data
Chapter 6 Tissue Classification Based on Gene Expression Data Many diseases result from complex interactions involving numerous genes. Previously, these gene interactions have been commonly studied separately.
More informationComputer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California
Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More informationEstimating Likelihood of Having a BRCA Gene Mutation Based on Family History of Cancers and Recommending Optimized Cancer Preventive Actions
University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School 11-12-2015 Estimating Likelihood of Having a BRCA Gene Mutation Based on Family History of Cancers and Recommending
More informationComparing Multifunctionality and Association Information when Classifying Oncogenes and Tumor Suppressor Genes
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationIdentification of Neuroimaging Biomarkers
Identification of Neuroimaging Biomarkers Dan Goodwin, Tom Bleymaier, Shipra Bhal Advisor: Dr. Amit Etkin M.D./PhD, Stanford Psychiatry Department Abstract We present a supervised learning approach to
More informationSparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism
Levy et al. Molecular Autism (2017) 8:65 DOI 10.1186/s13229-017-0180-6 RESEARCH Sparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism Sebastien
More informationSearching for Temporal Patterns in AmI Sensor Data
Searching for Temporal Patterns in AmI Sensor Data Romain Tavenard 1,2, Albert A. Salah 1, Eric J. Pauwels 1 1 Centrum voor Wiskunde en Informatica, CWI Amsterdam, The Netherlands 2 IRISA/ENS de Cachan,
More informationDissimilarity based learning
Department of Mathematics Master s Thesis Leiden University Dissimilarity based learning Niels Jongs 1 st Supervisor: Prof. dr. Mark de Rooij 2 nd Supervisor: Dr. Tim van Erven 3 rd Supervisor: Prof. dr.
More informationAn Improved Algorithm To Predict Recurrence Of Breast Cancer
An Improved Algorithm To Predict Recurrence Of Breast Cancer Umang Agrawal 1, Ass. Prof. Ishan K Rajani 2 1 M.E Computer Engineer, Silver Oak College of Engineering & Technology, Gujarat, India. 2 Assistant
More informationBIOINFORMATICS. Permutation importance: a corrected feature importance measure
BIOINFORMATICS Vol. 00 no. 00 2009 Pages 1 8 Permutation importance: a corrected feature importance measure André Altmann 1,, Laura Toloşi 1,, Oliver Sander 1,, Thomas Lengauer 1 1 Department of Computational
More informationCOPD: Genomic Biomarker Status and Challenge Scoring
COPD: Genomic Biomarker Status and Challenge Scoring Julia Hoeng, PMI R&D Raquel Norel, IBM Research 3 rd October 2012 COPD: Genomic Biomarker Status Julia Hoeng, Ph.D. Philip Morris International, Research
More informationBayesian Prediction Tree Models
Bayesian Prediction Tree Models Statistical Prediction Tree Modelling for Clinico-Genomics Clinical gene expression data - expression signatures, profiling Tree models for predictive sub-typing Combining
More informationCase Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD
Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review
More informationA Survey on Localizing Optic Disk
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1355-1359 International Research Publications House http://www. irphouse.com A Survey on Localizing
More informationDmitriy Fradkin. Ask.com
Using cluster analysis to determine the influence of demographic features on medical status of lung cancer patients Dmitriy Fradkin Askcom dmitriyfradkin@askcom Joint work with Dona Schneider (Bloustein
More informationPerformance of Median and Least Squares Regression for Slightly Skewed Data
World Academy of Science, Engineering and Technology 9 Performance of Median and Least Squares Regression for Slightly Skewed Data Carolina Bancayrin - Baguio Abstract This paper presents the concept of
More informationJ2.6 Imputation of missing data with nonlinear relationships
Sixth Conference on Artificial Intelligence Applications to Environmental Science 88th AMS Annual Meeting, New Orleans, LA 20-24 January 2008 J2.6 Imputation of missing with nonlinear relationships Michael
More informationAutomated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection
Y.-H. Wen, A. Bainbridge-Smith, A. B. Morris, Automated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection, Proceedings of Image and Vision Computing New Zealand 2007, pp. 132
More informationExpanded View Figures
Solip Park & Ben Lehner Epistasis is cancer type specific Molecular Systems Biology Expanded View Figures A B G C D E F H Figure EV1. Epistatic interactions detected in a pan-cancer analysis and saturation
More information7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.
Supplementary Figure 1 7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Regions targeted by the Even and Odd ChIRP probes mapped to a secondary structure model 56 of the
More informationT. R. Golub, D. K. Slonim & Others 1999
T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have
More informationMODEL SELECTION STRATEGIES. Tony Panzarella
MODEL SELECTION STRATEGIES Tony Panzarella Lab Course March 20, 2014 2 Preamble Although focus will be on time-to-event data the same principles apply to other outcome data Lab Course March 20, 2014 3
More informationIntegration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival
Polewko-Klim et al. Biology Direct (2018) 13:17 https://doi.org/10.1186/s13062-018-0222-9 RESEARCH Open Access Integration of multiple types of genetic markers for neuroblastoma may contribute to improved
More informationRNA preparation from extracted paraffin cores:
Supplementary methods, Nielsen et al., A comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor positive breast cancer.
More informationAutomated Medical Diagnosis using K-Nearest Neighbor Classification
(IMPACT FACTOR 5.96) Automated Medical Diagnosis using K-Nearest Neighbor Classification Zaheerabbas Punjani 1, B.E Student, TCET Mumbai, Maharashtra, India Ankush Deora 2, B.E Student, TCET Mumbai, Maharashtra,
More information