Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.
|
|
- Linda Kelly
- 6 years ago
- Views:
Transcription
1 Final review Based in part on slides from textbook, slides of Susan Holmes December 5, / 1
2 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension reduction. Distances. Multidimensional scaling. Multidimensional arrays. Decision trees. Performance measures for classifiers. Discriminant analysis. 2 / 1
3 Final review Overview After Midterm More classifiers: Rule-based Classifiers Nearest-Neighbour Classifiers Naive Bayes Classifiers Neural Networks Support Vector Machines Random Forests Boosting (AdaBoost / Gradient Boosting) Clustering. Outlier detection. 3 / 1
4 Rule based classifiers Rule-based Classifier (Example) R1: (Give Birth = no)! (Can Fly = yes) " Birds R2: (Give Birth = no)! (Live in Water = yes) " Fishes R3: (Give Birth = yes)! (Blood Type = warm) " Mammals R4: (Give Birth = no)! (Can Fly = no) " Reptiles R5: (Live in Water = sometimes) " Amphibians 4 / 1
5 Rule based classifiers Concepts coverage accuracy mutual exclusivity exhaustivity Laplace accuracy 5 / 1
6 Nearest Neighbor Classifiers Nearest neighbour classifier! Basic idea: If it walks like a duck, quacks like a duck, then it s probably a duck Compute Distance Test Record Training Records Choose k of the nearest records Tan,Steinbach, Kumar Introduction to 4/18/ / 1
7 o large, Nearest neighborhood neighbour classifier may include points fr lasses 7 / 1
8 Naive Bayes classifiers Naive Bayes classifiers Model: P(Y = c X 1 = x 1,..., X p = x p ( p ) P(X l = x l Y = c) P(Y = c) l=1 For continuous features, typically a 1-dimensional QDA model is used (i.e. Gaussian within each class). For discrete features: use the Laplace smoothed probabilities P(X j = l Y = c) = # {i : X ij = l, Y i = c} + α. # {Y i = c} + α k 8 / 1
9 ial Neural networks: Networks single layer (ANN) 9 / 1
10 Neural networks: double layer 10 / 1
11 Support Support Vector vector machine Machines 11 / 1
12 Support vector machines Support vector machines Solves the problem minimize β,α,ξ β 2 subject to y i (x T i β + α) 1 ξ i, ξ i 0, n i=1 ξ i C 12 / 1
13 Support vector machines Non-separable problems The ξ i s can be removed from this problem, yielding n minimize β,α β γ (1 y i f α,β (x i )) + i=1 where (z) + = max(z, 0) is the positive part function. Or, n minimize β,α (1 y i f α,β (x i )) + + λ β 2 2 i=1 13 / 1
14 Logistic vs. SVM Logistic SVM / 1
15 General EnsembleIdea methods 15 / 1
16 Ensemble methods Bagging / Random Forests In this method, one takes several bootstrap samples (samples with replacement) of the data. For each bootstrap sample S b, 1 b B, fit a model, retaining the classifier f,b. After all models have been fit, use majority vote f (x) = majority vote of (f,b (x)) 1 i B. Defined the OOB estimate of error. 16 / 1
17 Ensemble methods Illustrating AdaBoost Initial weights for each data point Data points for training 17 / 1
18 Ensemble methods Illustrating AdaBoost Tan,Steinbach, Kumar Introduction to 4/18/ / 1
19 Ensemble methods Boosting as gradient descent It turns out that boosting can be thought of as something like gradient descent. In some sense, the boosting algorithm is a steepest descent algorithm to find argmin f F n L(y i, f (x i )). i=1 19 / 1
20 What is Cluster Analysis? Cluster analysis! Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Intra-cluster distances are minimized Inter-cluster distances are maximized Tan,Steinbach, Kumar Introduction to 4/18/ / 1
21 Clustering Types of clustering Partitional A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset. Hierarchical A set of nested clusters organized as a hierarchical tree. Each data object is in exactly one subset for any horizontal cut of the tree / 1
22 Cluster analysis Unsupervised Learning X2 X 1 FIGURE Simulated data in the plane, clustered into three classes (represented A partitional by orange, example blue and green) by the K-means clustering algorithm 22 / 1
23 K-means Unsupervised Learning log WK Gap Number of Clusters Number of Clusters FIGURE (Left panel): observed (green) and expected (blue) values of log W K for the simulatedfigure data of : Figure Gap statistic Both curves have been translated to equal zero at one cluster. (Right panel): Gap curve, equal to the difference between the observed and expected values of log W K.TheGapestimateK is the smallest K producing a gap within one standard deviation of the gap at K +1; 23 / 1
24 K-medoid Algorithm Same as K-means, except that centroid is estimated not by the average, but by the observation having minimum pairwise distance with the other cluster members. Advantage: centroid is one of the observations useful, eg when features are 0 or 1. Also, one only needs pairwise distances for K-medoids rather than the raw observations. 24 / 1
25 Silhouette plot 25 / 1
26 Cluster analysis Unsupervised Learning LEUKEMIA K562B-repro K562A-repro BREAST BREAST CNS CNS BREAST NSCLC UNKNOWN OVARIAN MCF7A-repro BREAST MCF7D-repro LEUKEMIA LEUKEMIA LEUKEMIA LEUKEMIA MELANOMA OVARIAN OVARIAN BREAST NSCLC LEUKEMIA NSCLC MELANOMA RENAL RENAL RENAL RENAL RENAL RENAL RENAL NSCLC OVARIAN OVARIAN NSCLC NSCLC NSCLC PROSTATE OVARIAN PROSTATE RENAL CNS CNS CNS BREAST NSCLC NSCLC BREAST RENAL MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA COLON COLON COLON COLON COLON COLON COLON FIGURE Dendrogram from agglomerative hierarchical clustering with average linkage to the human tumor microarray data. A hierarchical example chical structure produced by the algorithm. Hierarchical methods impose 26 / 1
27 Hierarchical clustering Concepts Top-down vs. bottom up Different linkages: single linkage (minimum distance) complete linkage (maximum distance) 27 / 1
28 Mixture models Mixture models Similar to K-means but assignment to clusters is soft. Often applied with multivariate normal as the model within classes. EM algorithm used to fit the model: Estimate responsibilities. Estimate within class parameters replacing labels (unobserved) with responsibilities. 28 / 1
29 Model-based clustering Summary 1 Choose a type of mixture model (e.g. multivariate Normal) and a maximum number of clusters, K 2 Use a specialized hierarchical clustering technique: model-based hierarchical agglomeration. 3 Use clusters from previous step to initialize EM for the mixture model. 4 Uses BIC to compare different mixture models and models with different numbers of clusters. 29 / 1
30 Outliers 30 / 1
31 Outliers General steps Build a profile of the normal behavior. Use these summary statistics to detect anomalies, i.e. points whose characteristics are very far from the normal profile. General types of schemes involve a statistical model of normal, and far is measured in terms of likelihood. Example: Grubbs test chooses an outlier threshold to control Type I error of any declared outliers if data does actually follow the model / 1
32 32 / 1
# Assessment of gene expression levels between several cell group types is a common application of the unsupervised technique.
# Aleksey Morozov # Microarray Data Analysis Using Hierarchical Clustering. # The "unsupervised learning" approach deals with data that has the features X1,X2...Xp, but does not have an associated response
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More informationColon cancer subtypes from gene expression data
Colon cancer subtypes from gene expression data Nathan Cunningham Giuseppe Di Benedetto Sherman Ip Leon Law Module 6: Applied Statistics 26th February 2016 Aim Replicate findings of Felipe De Sousa et
More informationGene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering
Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene
More informationEECS 433 Statistical Pattern Recognition
EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern
More informationGene Selection for Tumor Classification Using Microarray Gene Expression Data
Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology
More informationMachine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017
Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 A.K.A. Artificial Intelligence Unsupervised learning! Cluster analysis Patterns, Clumps, and Joining
More informationReview: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections
Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationPredicting Breast Cancer Survival Using Treatment and Patient Factors
Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women
More informationKnowledge Discovery and Data Mining I
Ludwig-Maximilians-Universität München Lehrstuhl für Datenbanksysteme und Data Mining Prof. Dr. Thomas Seidl Knowledge Discovery and Data Mining I Winter Semester 2018/19 Introduction What is an outlier?
More informationClass Outlier Detection. Zuzana Pekarčíková
Class Outlier Detection Zuzana Pekarčíková Outline λ What is an Outlier? λ Applications of Outlier Detection λ Types of Outliers λ Outlier Detection Methods Types λ Basic Outlier Detection Methods λ High-dimensional
More informationPredicting Kidney Cancer Survival from Genomic Data
Predicting Kidney Cancer Survival from Genomic Data Christopher Sauer, Rishi Bedi, Duc Nguyen, Benedikt Bünz Abstract Cancers are on par with heart disease as the leading cause for mortality in the United
More informationComparison of discrimination methods for the classification of tumors using gene expression data
Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley
More information10CS664: PATTERN RECOGNITION QUESTION BANK
10CS664: PATTERN RECOGNITION QUESTION BANK Assignments would be handed out in class as well as posted on the class blog for the course. Please solve the problems in the exercises of the prescribed text
More informationIntroduction to Discrimination in Microarray Data Analysis
Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t
More informationNearest Shrunken Centroid as Feature Selection of Microarray Data
Nearest Shrunken Centroid as Feature Selection of Microarray Data Myungsook Klassen Computer Science Department, California Lutheran University 60 West Olsen Rd, Thousand Oaks, CA 91360 mklassen@clunet.edu
More informationClassification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang
Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression
More informationEfficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection
202 4th International onference on Bioinformatics and Biomedical Technology IPBEE vol.29 (202) (202) IASIT Press, Singapore Efficacy of the Extended Principal Orthogonal Decomposition on DA Microarray
More informationAlgorithms Implemented for Cancer Gene Searching and Classifications
Algorithms Implemented for Cancer Gene Searching and Classifications Murad M. Al-Rajab and Joan Lu School of Computing and Engineering, University of Huddersfield Huddersfield, UK {U1174101,j.lu}@hud.ac.uk
More informationGender Based Emotion Recognition using Speech Signals: A Review
50 Gender Based Emotion Recognition using Speech Signals: A Review Parvinder Kaur 1, Mandeep Kaur 2 1 Department of Electronics and Communication Engineering, Punjabi University, Patiala, India 2 Department
More informationSTAT 151B. Administrative Info. Statistics 151B: Introduction Modern Statistical Prediction and Machine Learning. Overview and introduction
Statistics 151B: Modern Statistical Prediction and Machine Learning Overview and introduction information Homepage: http://www.stat.berkeley.edu/ jon/ stat-151b-spring-2012 All announcements and materials
More informationComputer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California
Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface
More informationPredicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering
Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering Kunal Sharma CS 4641 Machine Learning Abstract Clustering is a technique that is commonly used in unsupervised
More informationPart [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals
Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Patrick J. Heagerty Department of Biostatistics University of Washington 174 Biomarkers Session Outline
More informationBrain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Brain Tumour Detection of MR Image Using Naïve
More informationRadiotherapy Outcomes
in partnership with Outcomes Models with Machine Learning Sarah Gulliford PhD Division of Radiotherapy & Imaging sarahg@icr.ac.uk AAPM 31 st July 2017 Making the discoveries that defeat cancer Radiotherapy
More informationMachine Learning for Predicting Delayed Onset Trauma Following Ischemic Stroke
Machine Learning for Predicting Delayed Onset Trauma Following Ischemic Stroke Anthony Ma 1, Gus Liu 1 Department of Computer Science, Stanford University, Stanford, CA 94305 Stroke is currently the third
More information1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp
The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve
More informationGeneralized additive model for disease risk prediction
Generalized additive model for disease risk prediction Guodong Chen Chu Kochen Honors College, Zhejiang University Channing Division of Network Medicine, BWH & HMS Advised by: Prof. Yang-Yu Liu 1 Is it
More informationSelection and Combination of Markers for Prediction
Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe
More informationClass discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines
Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Florian Markowetz and Anja von Heydebreck Max-Planck-Institute for Molecular Genetics Computational Molecular Biology
More informationClustering analysis of cancerous microarray data
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(9): 488-493 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Clustering analysis of cancerous microarray data
More informationBiomarker adaptive designs in clinical trials
Review Article Biomarker adaptive designs in clinical trials James J. Chen 1, Tzu-Pin Lu 1,2, Dung-Tsa Chen 3, Sue-Jane Wang 4 1 Division of Bioinformatics and Biostatistics, National Center for Toxicological
More informationBoosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer
Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Pei Wang Department of Statistics Stanford University Stanford, CA 94305 wp57@stanford.edu Young Kim, Jonathan Pollack Department
More informationAn Improved Algorithm To Predict Recurrence Of Breast Cancer
An Improved Algorithm To Predict Recurrence Of Breast Cancer Umang Agrawal 1, Ass. Prof. Ishan K Rajani 2 1 M.E Computer Engineer, Silver Oak College of Engineering & Technology, Gujarat, India. 2 Assistant
More informationSupporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction
Supporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction Amir Barati Farimani, Mohammad Heiranian, Narayana R. Aluru 1 Department
More informationIdentification of Tissue Independent Cancer Driver Genes
Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important
More informationNeurons and neural networks II. Hopfield network
Neurons and neural networks II. Hopfield network 1 Perceptron recap key ingredient: adaptivity of the system unsupervised vs supervised learning architecture for discrimination: single neuron perceptron
More informationNature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.
Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze
More informationBREAST CANCER EPIDEMIOLOGY MODEL:
BREAST CANCER EPIDEMIOLOGY MODEL: Calibrating Simulations via Optimization Michael C. Ferris, Geng Deng, Dennis G. Fryback, Vipat Kuruchittham University of Wisconsin 1 University of Wisconsin Breast Cancer
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017
RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science
More informationEvaluating Classifiers for Disease Gene Discovery
Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics
More informationTITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)
TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition
More informationPrediction of Successful Memory Encoding from fmri Data
Prediction of Successful Memory Encoding from fmri Data S.K. Balci 1, M.R. Sabuncu 1, J. Yoo 2, S.S. Ghosh 3, S. Whitfield-Gabrieli 2, J.D.E. Gabrieli 2 and P. Golland 1 1 CSAIL, MIT, Cambridge, MA, USA
More informationHybridized KNN and SVM for gene expression data classification
Mei, et al, Hybridized KNN and SVM for gene expression data classification Hybridized KNN and SVM for gene expression data classification Zhen Mei, Qi Shen *, Baoxian Ye Chemistry Department, Zhengzhou
More informationAccurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers
Journal of Medical Systems (2018) 42: 92 https://doi.org/10.1007/s10916-018-0940-7 Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers Md. Maniruzzaman 1,2
More informationClassification with microarray data
Classification with microarray data Aron Charles Eklund eklund@cbs.dtu.dk DNA Microarray Analysis - #27612 January 8, 2010 The rest of today Now: What is classification, and why do we do it? How to develop
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 19: Contextual Guidance of Attention Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 20 November
More informationDiagnosis of multiple cancer types by shrunken centroids of gene expression
Diagnosis of multiple cancer types by shrunken centroids of gene expression Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu PNAS 99:10:6567-6572, 14 May 2002 Nearest Centroid
More informationSupplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE
Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE Peng Qiu1,4, Erin F. Simonds2, Sean C. Bendall2, Kenneth D. Gibbs Jr.2, Robert V. Bruggner2, Michael
More informationPAIRED AND UNPAIRED COMPARISON AND CLUSTERING WITH GENE EXPRESSION DATA
Statistica Sinica 12(2002), 87-110 PAIRED AND UNPAIRED COMPARISON AND CLUSTERING WITH GENE EXPRESSION DATA Jenny Bryan 1, Katherine S. Pollard 2 and Mark J. van der Laan 2 1 University of British Columbia
More informationAccurate molecular classification of cancer using simple rules.
University of Nebraska Medical Center DigitalCommons@UNMC Journal Articles: Genetics, Cell Biology & Anatomy Genetics, Cell Biology & Anatomy 10-30-2009 Accurate molecular classification of cancer using
More informationData analysis in microarray experiment
16 1 004 Chinese Bulletin of Life Sciences Vol. 16, No. 1 Feb., 004 1004-0374 (004) 01-0041-08 100005 Q33 A Data analysis in microarray experiment YANG Chang, FANG Fu-De * (National Laboratory of Medical
More informationMayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department
Data Mining Techniques to Find Out Heart Diseases: An Overview Mayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department H.V.P.M s COET, Amravati
More informationPredicting Breast Cancer Recurrence Using Machine Learning Techniques
Predicting Breast Cancer Recurrence Using Machine Learning Techniques Umesh D R Department of Computer Science & Engineering PESCE, Mandya, Karnataka, India Dr. B Ramachandra Department of Electrical and
More informationStatistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies
Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Stanford Biostatistics Workshop Pierre Neuvial with Henrik Bengtsson and Terry Speed Department of Statistics, UC Berkeley
More informationContributions to Brain MRI Processing and Analysis
Contributions to Brain MRI Processing and Analysis Dissertation presented to the Department of Computer Science and Artificial Intelligence By María Teresa García Sebastián PhD Advisor: Prof. Manuel Graña
More informationT. R. Golub, D. K. Slonim & Others 1999
T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have
More informationSupplemental Figures. Figure S1: 2-component Gaussian mixture model of Bourdeau et al. s fold-change distribution
Supplemental Figures Figure S1: 2-component Gaussian mixture model of Bourdeau et al. s fold-change distribution 1 All CTCF Unreponsive RefSeq Frequency 0 2000 4000 6000 0 500 1000 1500 2000 Block Size
More informationarxiv: v2 [cs.lg] 30 Oct 2013
Prediction of breast cancer recurrence using Classification Restricted Boltzmann Machine with Dropping arxiv:1308.6324v2 [cs.lg] 30 Oct 2013 Jakub M. Tomczak Wrocław University of Technology Wrocław, Poland
More informationIntelligent Systems. Discriminative Learning. Parts marked by * are optional. WS2013/2014 Carsten Rother, Dmitrij Schlesinger
Intelligent Systems Discriminative Learning Parts marked by * are optional 30/12/2013 WS2013/2014 Carsten Rother, Dmitrij Schlesinger Discriminative models There exists a joint probability distribution
More informationReview of Chronic Kidney Disease based on Data Mining Techniques
Review of Chronic Kidney Disease based on Data Mining Techniques S.Dilli Arasu Research Scholar, Department of Computer Applications (Ph.D.), Bharath Institute of Higher Education and Research (BIHER),
More informationJournal: Nature Methods
Journal: Nature Methods Article Title: Network-based stratification of tumor mutations Corresponding Author: Trey Ideker Supplementary Item Supplementary Figure 1 Supplementary Figure 2 Supplementary Figure
More informationNature Medicine: doi: /nm.3967
Supplementary Figure 1. Network clustering. (a) Clustering performance as a function of inflation factor. The grey curve shows the median weighted Silhouette widths for varying inflation factors (f [1.6,
More informationRoadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:
Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:7332-7341 Presented by Deming Mi 7/25/2006 Major reasons for few prognostic factors to
More informationIntroduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018
Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this
More informationClassification of Microarray Gene Expression Data
Classification of Microarray Gene Expression Data Geoff McLachlan Department of Mathematics & Institute for Molecular Bioscience University of Queensland Institute for Molecular Bioscience, University
More informationData Mining in Bioinformatics Day 7: Clustering in Bioinformatics
Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 21 to March 4, 2011 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt:
More informationClassification of Synapses Using Spatial Protein Data
Classification of Synapses Using Spatial Protein Data Jenny Chen and Micol Marchetti-Bowick CS229 Final Project December 11, 2009 1 MOTIVATION Many human neurological and cognitive disorders are caused
More informationMammogram Analysis: Tumor Classification
Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the
More informationSVM-Kmeans: Support Vector Machine based on Kmeans Clustering for Breast Cancer Diagnosis
SVM-Kmeans: Support Vector Machine based on Kmeans Clustering for Breast Cancer Diagnosis Walaa Gad Faculty of Computers and Information Sciences Ain Shams University Cairo, Egypt Email: walaagad [AT]
More informationChapter 1. Introduction
Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a
More informationABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Data Mining Techniques to Predict Cancer Diseases
More informationVisualizing Data for Hypothesis Generation Using Large-Volume Health care Claims Data
Visualizing Data for Hypothesis Generation Using Large-Volume Health care Claims Data Eberechukwu Onukwugha PhD, School of Pharmacy, UMB Margret Bjarnadottir PhD, Smith School of Business, UMCP Shujia
More informationReliability of Ordination Analyses
Reliability of Ordination Analyses Objectives: Discuss Reliability Define Consistency and Accuracy Discuss Validation Methods Opening Thoughts Inference Space: What is it? Inference space can be defined
More informationReader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model
Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model Ruifeng Xu, Chengtian Zou, Jun Xu Key Laboratory of Network Oriented Intelligent Computation, Shenzhen Graduate School,
More informationProbability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data
Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data Tong WW, McComb ME, Perlman DH, Huang H, O Connor PB, Costello
More informationDiagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods
International Journal of Bioinformatics and Biomedical Engineering Vol. 1, No. 3, 2015, pp. 318-322 http://www.aiscience.org/journal/ijbbe ISSN: 2381-7399 (Print); ISSN: 2381-7402 (Online) Diagnosis of
More informationBLOOD GLUCOSE PREDICTION MODELS FOR PERSONALIZED DIABETES MANAGEMENT
BLOOD GLUCOSE PREDICTION MODELS FOR PERSONALIZED DIABETES MANAGEMENT A Thesis Submitted to the Graduate Faculty of the North Dakota State University of Agriculture and Applied Science By Warnakulasuriya
More informationAutomatic Hemorrhage Classification System Based On Svm Classifier
Automatic Hemorrhage Classification System Based On Svm Classifier Abstract - Brain hemorrhage is a bleeding in or around the brain which are caused by head trauma, high blood pressure and intracranial
More informationAClass: A Simple, Online Probabilistic Classifier. Vikash K. Mansinghka Computational Cognitive Science Group MIT BCS/CSAIL
AClass: A Simple, Online Probabilistic Classifier Vikash K. Mansinghka Computational Cognitive Science Group MIT BCS/CSAIL AClass: A Simple, Online Probabilistic Classifier or How I learned to stop worrying
More informationPREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH
PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH 1 VALLURI RISHIKA, M.TECH COMPUTER SCENCE AND SYSTEMS ENGINEERING, ANDHRA UNIVERSITY 2 A. MARY SOWJANYA, Assistant Professor COMPUTER SCENCE
More informationIdentifikation von Risikofaktoren in der koronaren Herzchirurgie
Identifikation von Risikofaktoren in der koronaren Herzchirurgie Julia Schiffner 1 Erhard Godehardt 2 Stefanie Hillebrand 1 Alexander Albert 2 Artur Lichtenberg 2 Claus Weihs 1 1 Fakultät Statistik, Technische
More informationSupplementary Information. Gauge size. midline. arcuate 10 < n < 15 5 < n < 10 1 < n < < n < 15 5 < n < 10 1 < n < 5. principal principal
Supplementary Information set set = Reward = Reward Gauge size Gauge size 3 Numer of correct trials 3 Numer of correct trials Supplementary Fig.. Principle of the Gauge increase. The gauge size (y axis)
More informationPredicting Breast Cancer Survivability Rates
Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer
More informationNMF-Density: NMF-Based Breast Density Classifier
NMF-Density: NMF-Based Breast Density Classifier Lahouari Ghouti and Abdullah H. Owaidh King Fahd University of Petroleum and Minerals - Department of Information and Computer Science. KFUPM Box 1128.
More informationCOMP9444 Neural Networks and Deep Learning 5. Convolutional Networks
COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks Textbook, Sections 6.2.2, 6.3, 7.9, 7.11-7.13, 9.1-9.5 COMP9444 17s2 Convolutional Networks 1 Outline Geometry of Hidden Unit Activations
More informationDetection of Cognitive States from fmri data using Machine Learning Techniques
Detection of Cognitive States from fmri data using Machine Learning Techniques Vishwajeet Singh, K.P. Miyapuram, Raju S. Bapi* University of Hyderabad Computational Intelligence Lab, Department of Computer
More informationarxiv: v1 [stat.ap] 8 Oct 2014
A Statistical Approach to Crime Linkage Michael D. Porter October 10, 2014 Abstract arxiv:1410.2285v1 [stat.ap] 8 Oct 2014 The object of this paper is to develop a statistical approach to criminal linkage
More informationIdeal Observers for Detecting Motion: Correspondence Noise
Ideal Observers for Detecting Motion: Correspondence Noise HongJing Lu Department of Psychology, UCLA Los Angeles, CA 90095 HongJing@psych.ucla.edu Alan Yuille Department of Statistics, UCLA Los Angeles,
More informationABSTRACT I. INTRODUCTION II. HEART DISEASE
1st International Conference on Applied Soft Computing Techniques 22 & 23.04.2017 In association with International Journal of Scientific Research in Science and Technology A Survey of Heart Disease Prediction
More informationModelling Spatially Correlated Survival Data for Individuals with Multiple Cancers
Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Dipak K. Dey, Ulysses Diva and Sudipto Banerjee Department of Statistics University of Connecticut, Storrs. March 16,
More informationAssigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science
Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science Abstract One method for analyzing pediatric B cell leukemia is to categorize
More informationFrom Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1
From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Contents Dedication... iii Acknowledgments... xi About This Book... xiii About the Author... xvii Chapter 1: Introduction...
More informationModel-free machine learning methods for personalized breast cancer risk prediction -SWISS PROMPT
Model-free machine learning methods for personalized breast cancer risk prediction -SWISS PROMPT Chang Ming, 22.11.2017 University of Basel Swiss Public Health Conference 2017 Breast Cancer & personalized
More informationPerformance and Saliency Analysis of Data from the Anomaly Detection Task Study
Performance and Saliency Analysis of Data from the Anomaly Detection Task Study Adrienne Raglin 1 and Andre Harrison 2 1 U.S. Army Research Laboratory, Adelphi, MD. 20783, USA {adrienne.j.raglin.civ, andre.v.harrison2.civ}@mail.mil
More informationTissue Classification Based on Gene Expression Data
Chapter 6 Tissue Classification Based on Gene Expression Data Many diseases result from complex interactions involving numerous genes. Previously, these gene interactions have been commonly studied separately.
More information