Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang
|
|
- Dorthy Bates
- 5 years ago
- Views:
Transcription
1 Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang
2 Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith
3 properties of Ms. Smith The expression profile a list of 30,000 numbers... that are all properties of Ms. Smith... some of them reflect her health problem (a tumor)... the profile is a digital image of Ms. Smith s tumor How can these numbers tell us (predict) whether Ms. Smith has tumor type A or tumor type B?
4 Looking for similarities? Ms. Smith Compare her profile to profiles of people with tumor type A and to patients with tumor type B
5 Training and Application There are patients of known class - the trainings samples - There are patients of unknown class - the new samples - Ms. Smith
6 Statistical Learning Use the trainings samples to learn how to predict new samples Ms. Smith
7 Prediction with 1 gene Color coded expression levels of trainings samples A B Ms. Smith Ms. Smith Ms. Smith type A type B borderline Which color shade is a good decision boundary?
8 Data optimal decision rule Use the cutoff with the fewest misclassifications on the trainings samples Smallest training error Decision boundary Distribution of expression values in type B Distribution of expression values in type A Training error
9 Overfitting The decision boundary was chosen to minimize the training error The two distributions of expression values for type A and B will be similar but not identical in a set of new cases We can not adjust the decision boundary because we do not know the class of the new samples Test errors are in average bigger then training errors Test error This phenomenon is called overfitting
10 Accumulating information across genes The top gene The average of the top 10 genes ALL vs. AML Golub et al.
11 Using a weighted average With good weights you get an improved separation
12 The geometry of weighted averages Calculating a weighted average is identical to projecting (orthogonally) the expression profiles onto the line defined by the weights vector
13 Hyperplanes A B 2 genes 3 genes Together with an offset 0 the weight vector defines an orthogonal hyperplane that cuts the data in two groups
14 Linear Signatures A B
15 Nearest Centroids
16 Diagonal Linear Discriminant Analysis (DLDA) Rescale axis according to the variances of genes
17 Discriminant Analysis The data often shows evidence of non identical covariances of genes in the two groups Hence using LDA, DLDA or NC introduces a bias but a good bias
18 Gene Filtering Rank genes according to a score Choose top n genes Build a signature with these genes only Still weights, but most of them are zero Note that the data decides which are zero and which are not Limitation: You have no chance to find these two genes
19 How many genes Is this a biological or a statistical question? Biology: How many genes carry diagnostic information? Statistics: How many genes should we use for classification? The microarray offers genes or more
20 Finding the needle in the haystack A common myth: Classification information is restricted to a small number of genes, the challenge is to find them
21 The Avalanche Aggressive lymphomas with and without a MYC-breakpoint MYC-neg MYC-pos Verbundprojekt maligne Lymphome
22 Genes, Bias & Overfitting The gap between training error and test error becomes wider There is a good statistical reason for not including hundreds of genes in a model even if they are biologically effected
23 Soft Thresholding The shrunken centroid method and the PAM package Tibshirani et al 2002 genes genes genes genes genes genes genes genes genes genes genes genes genes genes genes
24 Centroid Shrinkage
25 How much shrinkage is good in PAM? Train Train Train Select Train Train Train Select Train Train cross validation Compute the CV-Performance for several values of D Pick the D that gives you the smallest number of CV- Misclassifications Adaptive Model Selection PAM does this routinely
26 Model Selection Output of PAM Small D, many genes poor performance due to overfitting High D, few genes, poor performance due to lack of information underfitting - The optimal D is somewhere in the middle
27 Signatures you miss Assume protein A binds to protein B and inhibits it The clinical phenotype is caused by active protein A Predictive information is in expression of A minus expression of B Naïve Idea: Don t calculate weights based on single gene scores but optimize over all possible hyperplanes
28 Two different signatures based on the same genes Calling signature genes markers for a certain disease is misleading!
29 Only one of these problems exists Problem 1: No separating line Problem 2: Many separating lines Why is this a problem?
30 What about Ms. Smith? This problem is also related to overfitting... more soon
31 Prediction with 30,000 genes With the microarray we have more genes than patients Think about this in three dimensions There are three genes, two patients with known diagnosis (red and yellow) and Ms. Smith (green) There is always one plane separating red and yellow with Ms. Smith on the yellow side and a second separating plane with Ms. Smith on the red side OK! If all points fall onto one line it does not always work. However, for measured values this is very unlikely and never happens in praxis.
32 The overfitting disaster From the data alone we can not decide which genes are important for the diagnosis, nor can we give a reliable diagnosis for a new patient This has little to do medicine. It is a geometrical problem.
33 Consequences If you find a separating signature, it does not mean (yet) that you have a top publication in most cases it means nothing.
34 Meaningless vs. meaningful signatures There always exist separating signatures caused by overfitting - meaningless signatures - Hopefully there is also a separating signature caused by a disease mechanism - meaningful signatures We need to learn how to find and validate meaningful signatures
35 How to distinguish a meaningful signature from a meaningless signature? The meaningless signature might be separating small training error -... but it will not be predictive large error in applications The goal is not a separating signature but a predictive signature: Good performance in clinical practice!!!
36 Back to the problem of many separating hyperplanes Which hyperplane is the best?
37 Support Vector Machines Fat planes: With an infinitely thin plane the data can always be separated correctly, but not necessarily with a fat one. Again if a large margin separation exists, chances are good that we found something relevant. Large Margin Classifiers
38 Maximal Margin Hyperplane There are theoretical results that the size of the margin correlates with the test (!) error (V. Vapnik) SVMs are not only optimized to fit to the training data but for predictive performance directly
39 No separable training set Penalty of error: distance to hyperplane multiplied by a parameter c Balance over- and underfitting
40 Independent Validation The accuracy of a signature on the data it was learned from is biased because of the overfitting phenomenon Validation of a signature requires independent test data Test error
41 Test Sets Split data into test and training data ok ok mistake
42 Selection Bias The test data must not be used for gene selection or adaptive model selection, otherwise the observed accuracy is biased Selection bias
43 Cross Validation Train Train Eval Train Train Train Train Train Eval Train You can not evaluate a fitted classification model (= signature) using cross validation Cross validation only evaluates the algorithm with which the signature was build Gene selection must be repeated for every relearning step in the cross validation In the loop gene selection
44 Leave one out Cross-Validation 1 Train Train Eval Train Train Train Essentially the same Train Train Eval But you only leave one sample out at a time and predict it using the others Good for small training sets 1 Train
45 Performance Estimation Estimators of performance have a variance which can be high. The chances of a meaningless signature to produce 100% accuracy on test data is high if the test data includes only few patients Nested 10-fold- CV Variance from 100 random partitions Reuse of the same data no sample variance
46 External Validation and Documentation Documenting a signature is conceptually different from giving a list of genes, although is is what most publications give you In order to validate a signature on external data or apply it in practice: - All model parameters need to be specified - The scale of the normalized data to which the model refers needs to be specified Add on normalization
47 Establishing a signature Split Data into Training and Test Data Test data only: Internal validation Full quantitative specification External Validations Training data only: Machine Learning - select genes - find the optimal number of genes - learn model parameters
48 DOs AND DONTs : 1. Decide on your diagnosis model (PAM,SVM,etc...) and don t change your mind later on 2. Split your profiles randomly into a training set and a test set 3. Put the data in the test set away... far away 4. Train your model only using the data in the training set (select genes, define centroids, calculate normal vectors for large margin separators, perform adaptive model selection...) don t even think of touching the test data at this time 5. Apply the model to the test data... don t even think of changing the model at this time 6. Do steps 1-5 only once and accept the result... don t even think of optimizing this procedure
49 Questions?
50 The bias variance trade off
51 Which model is best? Experience: Linear models work fine Sparse data: Expression data in high dimensions is sparse. The data does not contain information to identify non linear structures adequately, even if they exist.
Diagnosis of multiple cancer types by shrunken centroids of gene expression
Diagnosis of multiple cancer types by shrunken centroids of gene expression Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu PNAS 99:10:6567-6572, 14 May 2002 Nearest Centroid
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and
More informationColon cancer subtypes from gene expression data
Colon cancer subtypes from gene expression data Nathan Cunningham Giuseppe Di Benedetto Sherman Ip Leon Law Module 6: Applied Statistics 26th February 2016 Aim Replicate findings of Felipe De Sousa et
More informationIntroduction to Discrimination in Microarray Data Analysis
Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t
More informationClassification with microarray data
Classification with microarray data Aron Charles Eklund eklund@cbs.dtu.dk DNA Microarray Analysis - #27612 January 8, 2010 The rest of today Now: What is classification, and why do we do it? How to develop
More informationRoadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:
Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:7332-7341 Presented by Deming Mi 7/25/2006 Major reasons for few prognostic factors to
More informationGene Selection for Tumor Classification Using Microarray Gene Expression Data
Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology
More informationClass discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines
Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Florian Markowetz and Anja von Heydebreck Max-Planck-Institute for Molecular Genetics Computational Molecular Biology
More informationResponse to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008
Journal of Machine Learning Research 9 (2008) 59-64 Published 1/08 Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008 Jerome Friedman Trevor Hastie Robert
More informationComparison of discrimination methods for the classification of tumors using gene expression data
Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley
More informationMachine Learning to Inform Breast Cancer Post-Recovery Surveillance
Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction
More informationEfficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection
202 4th International onference on Bioinformatics and Biomedical Technology IPBEE vol.29 (202) (202) IASIT Press, Singapore Efficacy of the Extended Principal Orthogonal Decomposition on DA Microarray
More informationClassification of cancer profiles. ABDBM Ron Shamir
Classification of cancer profiles 1 Background: Cancer Classification Cancer classification is central to cancer treatment; Traditional cancer classification methods: location; morphology, cytogenesis;
More informationStatistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.
Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationIntroduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018
Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this
More informationReview: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections
Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi
More informationAssigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science
Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science Abstract One method for analyzing pediatric B cell leukemia is to categorize
More informationT. R. Golub, D. K. Slonim & Others 1999
T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have
More informationBiomarker adaptive designs in clinical trials
Review Article Biomarker adaptive designs in clinical trials James J. Chen 1, Tzu-Pin Lu 1,2, Dung-Tsa Chen 3, Sue-Jane Wang 4 1 Division of Bioinformatics and Biostatistics, National Center for Toxicological
More informationAppendix: Instructions for Treatment Index B (Human Opponents, With Recommendations)
Appendix: Instructions for Treatment Index B (Human Opponents, With Recommendations) This is an experiment in the economics of strategic decision making. Various agencies have provided funds for this research.
More informationModeling Sentiment with Ridge Regression
Modeling Sentiment with Ridge Regression Luke Segars 2/20/2012 The goal of this project was to generate a linear sentiment model for classifying Amazon book reviews according to their star rank. More generally,
More informationEfficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features
American Journal of Applied Sciences 8 (12): 1295-1301, 2011 ISSN 1546-9239 2011 Science Publications Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine
More informationIntelligent Systems. Discriminative Learning. Parts marked by * are optional. WS2013/2014 Carsten Rother, Dmitrij Schlesinger
Intelligent Systems Discriminative Learning Parts marked by * are optional 30/12/2013 WS2013/2014 Carsten Rother, Dmitrij Schlesinger Discriminative models There exists a joint probability distribution
More informationNearest Shrunken Centroid as Feature Selection of Microarray Data
Nearest Shrunken Centroid as Feature Selection of Microarray Data Myungsook Klassen Computer Science Department, California Lutheran University 60 West Olsen Rd, Thousand Oaks, CA 91360 mklassen@clunet.edu
More informationHARRISON ASSESSMENTS DEBRIEF GUIDE 1. OVERVIEW OF HARRISON ASSESSMENT
HARRISON ASSESSMENTS HARRISON ASSESSMENTS DEBRIEF GUIDE 1. OVERVIEW OF HARRISON ASSESSMENT Have you put aside an hour and do you have a hard copy of your report? Get a quick take on their initial reactions
More informationSAMPLING AND SAMPLE SIZE
SAMPLING AND SAMPLE SIZE Andrew Zeitlin Georgetown University and IGC Rwanda With slides from Ben Olken and the World Bank s Development Impact Evaluation Initiative 2 Review We want to learn how a program
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 8, Issue 1 2009 Article 13 Detecting Outlier Samples in Microarray Data Albert D. Shieh Yeung Sam Hung Harvard University, shieh@fas.harvard.edu
More informationSmall Group Presentations
Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the
More informationINTRODUCTION TO MACHINE LEARNING. Decision tree learning
INTRODUCTION TO MACHINE LEARNING Decision tree learning Task of classification Automatically assign class to observations with features Observation: vector of features, with a class Automatically assign
More informationApplication of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures
Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures 1 2 3 4 5 Kathleen T Quach Department of Neuroscience University of California, San Diego
More informationAPPENDIX N. Summary Statistics: The "Big 5" Statistical Tools for School Counselors
APPENDIX N Summary Statistics: The "Big 5" Statistical Tools for School Counselors This appendix describes five basic statistical tools school counselors may use in conducting results based evaluation.
More informationCSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil
CSE 5194.01 - Introduction to High-Perfomance Deep Learning ImageNet & VGG Jihyung Kil ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,
More information3. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More informationWhat is Regularization? Example by Sean Owen
What is Regularization? Example by Sean Owen What is Regularization? Name3 Species Size Threat Bo snake small friendly Miley dog small friendly Fifi cat small enemy Muffy cat small friendly Rufus dog large
More informationError Detection based on neural signals
Error Detection based on neural signals Nir Even- Chen and Igor Berman, Electrical Engineering, Stanford Introduction Brain computer interface (BCI) is a direct communication pathway between the brain
More informationWhat does the Nutrition Facts table tell you about this packaged food?
Figure out the facts What does the table tell you about this packaged food? 1. What is the serving size? 2. How many Calories are in one serving? 3. How many grams of fat are in one serving? 4. Circle
More informationNumerical Integration of Bivariate Gaussian Distribution
Numerical Integration of Bivariate Gaussian Distribution S. H. Derakhshan and C. V. Deutsch The bivariate normal distribution arises in many geostatistical applications as most geostatistical techniques
More informationGene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering
Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene
More informationAbdul Latif Jameel Poverty Action Lab Executive Training: Evaluating Social Programs Spring 2009
MIT OpenCourseWare http://ocw.mit.edu Abdul Latif Jameel Poverty Action Lab Executive Training: Evaluating Social Programs Spring 2009 For information about citing these materials or our Terms of Use,
More informationPlanning Sample Size for Randomized Evaluations.
Planning Sample Size for Randomized Evaluations www.povertyactionlab.org Planning Sample Size for Randomized Evaluations General question: How large does the sample need to be to credibly detect a given
More informationData complexity measures for analyzing the effect of SMOTE over microarrays
ESANN 216 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 27-29 April 216, i6doc.com publ., ISBN 978-2878727-8. Data complexity
More informationTHE data used in this project is provided. SEIZURE forecasting systems hold promise. Seizure Prediction from Intracranial EEG Recordings
1 Seizure Prediction from Intracranial EEG Recordings Alex Fu, Spencer Gibbs, and Yuqi Liu 1 INTRODUCTION SEIZURE forecasting systems hold promise for improving the quality of life for patients with epilepsy.
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More information4. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2017 4. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More information15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA
15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA Statistics does all kinds of stuff to describe data Talk about baseball, other useful stuff We can calculate the probability.
More informationPredicting Breast Cancer Survival Using Treatment and Patient Factors
Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women
More informationSUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing
Categorical Speech Representation in the Human Superior Temporal Gyrus Edward F. Chang, Jochem W. Rieger, Keith D. Johnson, Mitchel S. Berger, Nicholas M. Barbaro, Robert T. Knight SUPPLEMENTARY INFORMATION
More informationStat Wk 9: Hypothesis Tests and Analysis
Stat 342 - Wk 9: Hypothesis Tests and Analysis Crash course on ANOVA, proc glm Stat 342 Notes. Week 9 Page 1 / 57 Crash Course: ANOVA AnOVa stands for Analysis Of Variance. Sometimes it s called ANOVA,
More informationMAT Mathematics in Today's World
MAT 1000 Mathematics in Today's World Last Time 1. What does a sample tell us about the population? 2. Practical problems in sample surveys. Last Time Parameter: Number that describes a population Statistic:
More informationClassification of Mammograms using Gray-level Co-occurrence Matrix and Support Vector Machine Classifier
Classification of Mammograms using Gray-level Co-occurrence Matrix and Support Vector Machine Classifier P.Samyuktha,Vasavi College of engineering,cse dept. D.Sriharsha, IDD, Comp. Sc. & Engg., IIT (BHU),
More informationWhite Paper Estimating Complex Phenotype Prevalence Using Predictive Models
White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015
More informationColour Communication.
Colour Communication. Understanding and expressing colour to your lab to achieve the best results. I by no means claim to be an expert on colour or even on communication, as my technicians will tell you.
More informationStatistics 2. RCBD Review. Agriculture Innovation Program
Statistics 2. RCBD Review 2014. Prepared by Lauren Pincus With input from Mark Bell and Richard Plant Agriculture Innovation Program 1 Table of Contents Questions for review... 3 Answers... 3 Materials
More informationGlossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha
Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha attrition: When data are missing because we are unable to measure the outcomes of some of the
More informationMammogram Analysis: Tumor Classification
Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the
More informationAutomated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection
Y.-H. Wen, A. Bainbridge-Smith, A. B. Morris, Automated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection, Proceedings of Image and Vision Computing New Zealand 2007, pp. 132
More informationUvA-DARE (Digital Academic Repository)
UvA-DARE (Digital Academic Repository) A classification model for the Leiden proteomics competition Hoefsloot, H.C.J.; Berkenbos-Smit, S.; Smilde, A.K. Published in: Statistical Applications in Genetics
More informationA New Approach for Detection and Classification of Diabetic Retinopathy Using PNN and SVM Classifiers
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 5, Ver. I (Sep.- Oct. 2017), PP 62-68 www.iosrjournals.org A New Approach for Detection and Classification
More informationExperimental and survey design
Friday, October 12, 2001 Page: 1 Experimental and survey design 1. There is a positive association between the number of drownings and ice cream sales. This is an example of an association likely caused
More informationWelcome to the RECIST 1.1 Quick Reference
Welcome to the RECIST 1.1 Quick Reference *Eisenhauer, E. A., et al. New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1). Eur J Cancer 2009;45:228-47. Subject Eligibility
More informationLearning from data when all models are wrong
Learning from data when all models are wrong Peter Grünwald CWI / Leiden Menu Two Pictures 1. Introduction 2. Learning when Models are Seriously Wrong Joint work with John Langford, Tim van Erven, Steven
More information7.1 Grading Diabetic Retinopathy
Chapter 7 DIABETIC RETINOPATHYGRADING -------------------------------------------------------------------------------------------------------------------------------------- A consistent approach to the
More informationExtraction and Identification of Tumor Regions from MRI using Zernike Moments and SVM
I J C T A, 8(5), 2015, pp. 2327-2334 International Science Press Extraction and Identification of Tumor Regions from MRI using Zernike Moments and SVM Sreeja Mole S.S.*, Sree sankar J.** and Ashwin V.H.***
More informationMachine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017
Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 A.K.A. Artificial Intelligence Unsupervised learning! Cluster analysis Patterns, Clumps, and Joining
More informationUnsupervised MRI Brain Tumor Detection Techniques with Morphological Operations
Unsupervised MRI Brain Tumor Detection Techniques with Morphological Operations Ritu Verma, Sujeet Tiwari, Naazish Rahim Abstract Tumor is a deformity in human body cells which, if not detected and treated,
More informationEvaluation of Gene Selection Using Support Vector Machine Recursive Feature Elimination
Evaluation of Gene Selection Using Support Vector Machine Recursive Feature Elimination Committee: Advisor: Dr. Rosemary Renaut Dr. Adrienne C. Scheck Dr. Kenneth Hoober Dr. Bradford Kirkman-Liff John
More informationPredicting Kidney Cancer Survival from Genomic Data
Predicting Kidney Cancer Survival from Genomic Data Christopher Sauer, Rishi Bedi, Duc Nguyen, Benedikt Bünz Abstract Cancers are on par with heart disease as the leading cause for mortality in the United
More informationChapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 19 Confidence Intervals for Proportions Copyright 2010, 2007, 2004 Pearson Education, Inc. Standard Error Both of the sampling distributions we ve looked at are Normal. For proportions For means
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017
RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science
More informationCSE 258 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression
CSE 258 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression Supervised versus unsupervised learning Learning approaches attempt to model data in order to solve a problem Unsupervised
More informationBootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers
Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers Kai-Ming Jiang 1,2, Bao-Liang Lu 1,2, and Lei Xu 1,2,3(&) 1 Department of Computer Science and Engineering,
More informationFormulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification
Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification Reza Lotfian and Carlos Busso Multimodal Signal Processing (MSP) lab The University of Texas
More informationSparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism
Levy et al. Molecular Autism (2017) 8:65 DOI 10.1186/s13229-017-0180-6 RESEARCH Sparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism Sebastien
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More informationCase Studies of Signed Networks
Case Studies of Signed Networks Christopher Wang December 10, 2014 Abstract Many studies on signed social networks focus on predicting the different relationships between users. However this prediction
More informationLesson 9: Two Factor ANOVAS
Published on Agron 513 (https://courses.agron.iastate.edu/agron513) Home > Lesson 9 Lesson 9: Two Factor ANOVAS Developed by: Ron Mowers, Marin Harbur, and Ken Moore Completion Time: 1 week Introduction
More informationClustering Autism Cases on Social Functioning
Clustering Autism Cases on Social Functioning Nelson Ray and Praveen Bommannavar 1 Introduction Autism is a highly heterogeneous disorder with wide variability in social functioning. Many diagnostic and
More information1. What is the I-T approach, and why would you want to use it? a) Estimated expected relative K-L distance.
Natural Resources Data Analysis Lecture Notes Brian R. Mitchell VI. Week 6: A. As I ve been reading over assignments, it has occurred to me that I may have glossed over some pretty basic but critical issues.
More informationIdentification of Tissue Independent Cancer Driver Genes
Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important
More informationApplying Machine Learning Methods in Medical Research Studies
Applying Machine Learning Methods in Medical Research Studies Daniel Stahl Department of Biostatistics and Health Informatics Psychiatry, Psychology & Neuroscience (IoPPN), King s College London daniel.r.stahl@kcl.ac.uk
More informationApplying One-vs-One and One-vs-All Classifiers in k-nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem
Oral Presentation at MIE 2011 30th August 2011 Oslo Applying One-vs-One and One-vs-All Classifiers in k-nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem Kirsi
More informationChapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc.
Chapter 19 Confidence Intervals for Proportions Copyright 2010 Pearson Education, Inc. Standard Error Both of the sampling distributions we ve looked at are Normal. For proportions For means SD pˆ pq n
More informationLarge-Scale Statistical Modelling via Machine Learning Classifiers
J. Stat. Appl. Pro. 2, No. 3, 203-222 (2013) 203 Journal of Statistics Applications & Probability An International Journal http://dx.doi.org/10.12785/jsap/020303 Large-Scale Statistical Modelling via Machine
More informationExploration and Exploitation in Reinforcement Learning
Exploration and Exploitation in Reinforcement Learning Melanie Coggan Research supervised by Prof. Doina Precup CRA-W DMP Project at McGill University (2004) 1/18 Introduction A common problem in reinforcement
More informationVariable Features Selection for Classification of Medical Data using SVM
Variable Features Selection for Classification of Medical Data using SVM Monika Lamba USICT, GGSIPU, Delhi, India ABSTRACT: The parameters selection in support vector machines (SVM), with regards to accuracy
More informationReveal Relationships in Categorical Data
SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction
More informationLearning with Rare Cases and Small Disjuncts
Appears in Proceedings of the 12 th International Conference on Machine Learning, Morgan Kaufmann, 1995, 558-565. Learning with Rare Cases and Small Disjuncts Gary M. Weiss Rutgers University/AT&T Bell
More informationIdentifikation von Risikofaktoren in der koronaren Herzchirurgie
Identifikation von Risikofaktoren in der koronaren Herzchirurgie Julia Schiffner 1 Erhard Godehardt 2 Stefanie Hillebrand 1 Alexander Albert 2 Artur Lichtenberg 2 Claus Weihs 1 1 Fakultät Statistik, Technische
More informationResearch methods in sensation and perception. (and the princess and the pea)
Research methods in sensation and perception (and the princess and the pea) Sensory Thresholds We can measure stuff in the world like pressure, sound, light, etc. We can't easily measure your psychological
More informationA measure of the impact of CV incompleteness on prediction error estimation with application to PCA and normalization
Hornung et al. BMC Medical Research Methodology () :95 DOI 10.16/s4-0-0088-9 RESEARCH ARTICLE Open Access A measure of the impact of CV incompleteness on prediction error estimation with application to
More informationTowards Open Set Deep Networks: Supplemental
Towards Open Set Deep Networks: Supplemental Abhijit Bendale*, Terrance E. Boult University of Colorado at Colorado Springs {abendale,tboult}@vast.uccs.edu In this supplement, we provide we provide additional
More informationNational Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2009 Formula Grant
National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2009 Formula Grant Reporting Period July 1, 2011 June 30, 2012 Formula Grant Overview The National Surgical
More informationAdjustment of systematic microarray data biases
Adjustment of systematic microarray data biases Monica Benito 1, Joel Parker 2, Quan Du 3, Lambert Skoog 3, Annika Lindblom 3, Charles M. Perou 2# and J. S. Marron 4# 1 Department of Statistics and Econometrics
More informationBayes Linear Statistics. Theory and Methods
Bayes Linear Statistics Theory and Methods Michael Goldstein and David Wooff Durham University, UK BICENTENNI AL BICENTENNIAL Contents r Preface xvii 1 The Bayes linear approach 1 1.1 Combining beliefs
More informationChapter 8 Estimating with Confidence
Chapter 8 Estimating with Confidence Introduction Our goal in many statistical settings is to use a sample statistic to estimate a population parameter. In Chapter 4, we learned if we randomly select the
More informationEvaluating Classifiers for Disease Gene Discovery
Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics
More informationROC Curve. Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341)
ROC Curve Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341) 580342 ROC Curve The ROC Curve procedure provides a useful way to evaluate the performance of classification
More informationPopulation. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:
Section 1.0 Making Sense of Data Statistics: Data Analysis: Individuals objects described by a set of data Variable any characteristic of an individual Categorical Variable places an individual into one
More information