INTRODUCTION TO MACHINE LEARNING. Decision tree learning
|
|
- Ella Marshall
- 6 years ago
- Views:
Transcription
1 INTRODUCTION TO MACHINE LEARNING Decision tree learning
2 Task of classification Automatically assign class to observations with features Observation: vector of features, with a class Automatically assign class to new observation with features, using previous observations Binary classification: two classes Multiclass classification: more than two classes
3 Example A dataset consisting of persons Features: age, weight and income Class: binary: happy or not happy multiclass: happy, satisfied or not happy
4 Examples of features Features can be numerical age: 23, 25, 75, height: 175.3, 179.5, Features can be categorical travel_class: first class, business class, coach class smokes?: yes, no
5 The decision tree Suppose you re classifying patients as sick or not sick Intuitive way of classifying: ask questions Is the patient young or old?
6 The decision tree Suppose you re classifying patients as sick or not sick Intuitive way of classifying: ask questions Is the patient young or old? Old
7 The decision tree Suppose you re classifying patients as sick or not sick Intuitive way of classifying: ask questions Is the patient young or old? Old Smoked for more than 10 years?
8 The decision tree Suppose you re classifying patients as sick or not sick Intuitive way of classifying: ask questions Is the patient young or old? Young Old Vaccinated against the measles? Smoked for more than 10 years?
9 The decision tree Suppose you re classifying patients as sick or not sick Intuitive way of classifying: ask questions Is the patient young or old? Young Old Vaccinated against the measles? Smoked for more than 10 years? Yes No Yes No
10 The decision tree Suppose you re classifying patients as sick or not sick Intuitive way of classifying: ask questions Is the patient young or old? Young Old Vaccinated against the measles? Smoked for more than 10 years? Yes No Yes No It s a decision tree!!!
11 Define the tree A B C D E F G
12 Define the tree A Nodes B C D E F G
13 Define the tree A Edges B C D E F G
14 Define the tree Root A B C D E F G
15 Define the tree Root A B C D E F G Leafs
16 Define the tree Root A Children of A B C Children of B, C Grandchildren of A D E F G
17 Define the tree Root A Children of A B C Children of B, C Grandchildren of A D E F G Leafs
18 Questions to ask age <= 18 yes no vaccinated smoked yes no yes no not sick sick sick not sick
19 Categorical feature Can be a feature test on itself travel_class: coach, business or first travel_class coach first business
20 Classifying with the tree Observation: patient of 40 years, vaccinated and didn t smoke age <= 18 yes no vaccinated smoked yes no yes no not sick sick sick not sick
21 Classifying with the tree Observation: patient of 40 years, vaccinated and didn t smoke age <= 18 yes no vaccinated smoked yes no yes no not sick sick sick not sick
22 Classifying with the tree Observation: patient of 40 years, vaccinated and didn t smoke age <= 18 yes no vaccinated smoked yes no yes no not sick sick sick not sick
23 Classifying with the tree Observation: patient of 40 years, vaccinated and didn t smoke age <= 18 yes no vaccinated smoked yes no yes no not sick sick sick not sick
24 Classifying with the tree Observation: patient of 40 years, vaccinated and didn t smoke age <= 18 yes no vaccinated smoked yes no yes no not sick sick sick not sick
25 Classifying with the tree Observation: patient of 40 years, vaccinated and didn t smoke age <= 18 yes no vaccinated smoked yes no yes no not sick sick sick not sick Prediction: not sick
26 Learn a tree Use training set Come up with queries (feature tests) at each node
27 training set Split into parts 2 parts for binary test age <= 18 yes no part of training set part of training set TRUE FALSE
28 part of training set part of training set feature test feature test yes no yes no part of training set part of training set part of training set part of training set
29 part of training set part of training set part of training set part of training set keep splitting until leafs contain small portion of training set
30 Learn the tree Goal: end up with pure leafs leafs that contain observations of one particular class leaf part of training set class 1 class class 2
31 Learn the tree Goal: end up with pure leafs leafs that contain observations of one particular class In practice: almost never the case noise leaf When classifying new instances end up in leaf part of training set class 1 class class 2
32 Learn the tree Goal: end up with pure leafs leafs that contain observations of one particular class In practice: almost never the case noise leaf When classifying new instances end up in leaf part of training set class 1 class 2 assign class of majority of training instances
33 Learn the tree At each node Iterate over different feature tests Choose the best one Comes down to two parts Make list of feature tests Choose test with best split
34 Construct list of tests Categorical features Parents/grandparents/ didn t use the test yet Numerical features Choose feature Choose threshold
35 Choose best feature test More complex Use splitting criteria to decide which test to use Information gain ~ entropy
36 Information gain Information gained from split based on feature test Test leads to nicely divided classes -> high information gain Test leads to scrambled classes -> low information gain Test with highest information gain will be chosen
37 Pruning Number of nodes influences chance on overfit Restrict size higher bias Decrease chance on overfit Pruning the tree
38 INTRODUCTION TO MACHINE LEARNING Let s practice!
39 INTRODUCTION TO MACHINE LEARNING k-nearest Neighbors
40 Instance-based learning Save training set in memory No real model like decision tree Compare unseen instances to training set Predict using the comparison of unseen data and the training set
41 k-nearest Neighbor Form of instance-based learning Simplest form: 1-Nearest Neighbor or Nearest Neighbor
42 Nearest Neighbor - example 2 features: X1 and X2 Class: red or blue Binary classification
43 Nearest Neighbor - example Introduction to Machine Learning
44 Nearest Neighbor - example Save complete training set
45 Nearest Neighbor - example Save complete training set Given: unseen observation with features X = (1.3, -2)
46 Nearest Neighbor - example Save complete training set Given: unseen observation with features X = (1.3, -2) Compare training set with new observation
47 Nearest Neighbor - example Save complete training set Given: unseen observation with features X = (1.3, -2) Compare training set with new observation Find closest observation nearest neighbor and assign same class just Euclidean distance, nothing fancy
48 k-nearest Neighbors k is the amount of neighbors If k = 5 Use 5 most similar observations (neighbors) Assigned class will be the most represented class within the 5 neighbors
49 Distance metric Important aspect of k-nn
50 Distance metric Important aspect of k-nn Euclidian distance:
51 Distance metric Important aspect of k-nn Euclidian distance: Manhattan distance:
52 Scaling - example Dataset with 2 features: weight and height 3 observations height (m) weight (kg) distance: 0.5 distance: 0.13
53 Scaling - example Dataset with 2 features: weight and height 3 observations Scale influences distance! height (cm) weight (kg) distance: 0.5 distance: 13
54 Scaling Normalize all features e.g. rescale values between 0 and 1 Gives better measure of real distance Don t forget to scale new observations
55 Categorical features How to use in distance metric? Dummy variables 1 categorical features with N possible outcomes to N binary features (2 outcomes)
56 Dummy variables Example mother tongue: Spanish, Italian or French mother_tongue Spanish Italian Italian Spanish French French French spanish italian french
57 INTRODUCTION TO MACHINE LEARNING Let s practice!
58 INTRODUCTION TO MACHINE LEARNING Introducing: The ROC curve
59 Introducing Very powerful performance measure For binary classification Reiceiver Operator Characteristic Curve (ROC Curve)
60 Probabilities as output Used decision trees and k-nn to predict class They can also output probability that instance belongs to class
61 Probabilities as output - example Binary classification Decide whether patient is sick or not sick decision function! Define probability threshold from which you decide patient to be sick Avoid sending sick patient home: lower threshold to 30% Decision tree: New patient: 70% 30% More patients classified as higher than 50% classify as More but also patients classified as
62 Confusion matrix Other performance measure for classification Important to construct the ROC curve
63 Confusion matrix Binary classifier: positive or negative (1 or 0) Prediction P N Truth p TP FN n FP TN
64 Confusion matrix Binary classifier: positive or negative (1 or 0) Prediction True Positives Prediction: P Truth: P Truth P N p TP FN n FP TN
65 Confusion matrix Binary classifier: positive or negative (1 or 0) Prediction False Negatives Prediction: N Truth: P Truth P N p TP FN n FP TN
66 Confusion matrix Binary classifier: positive or negative (1 or 0) Prediction False Positives Prediction: P Truth: N Truth P N p TP FN n FP TN
67 Confusion matrix Binary classifier: positive or negative (1 or 0) Prediction True Negatives Prediction: N Truth: N Truth P N p TP FN n FP TN
68 Ratios in the confusion matrix True positive rate (TPR) = recall False positive rate (FPR) Prediction TPR TP/(TP+FN) P N Truth p TP FN n FP TN
69 Ratios in the confusion matrix True positive rate (TPR) = recall False positive rate (FPR) Prediction Truly TPR TP/(TP+FN) P N p TP FN + Truth n FP TN Truly Falsely
70 Ratios in the confusion matrix True positive rate (TPR) = recall False positive rate (FPR) Prediction FPR FP/(FP+TN) P N Truth p TP FN n FP TN
71 Ratios in the confusion matrix True positive rate (TPR) = recall False positive rate (FPR) Prediction Falsely FPR FP/(FP+TN) P N p TP FN + Truth n FP TN Falsely Truly
72 ROC curve Horizontal axis: FPR Vertical axis: TPR How to draw the curve? True positive rate False positive rate
73 Draw the curve Need classifier which outputs probabilities The decision function probability decide to diagnose threshold by decision function probability
74 Draw the curve Need classifier which outputs probabilities The decision function probability decide to diagnose threshold by decision function probability
75 True positive rate >=50%: sick < 50%: healthy False positive rate 50% probability
76 True positive rate all sick False positive rate 0% probability
77 True positive rate all healthy False positive rate 100% probability
78 Interpreting the curve Is it a good curve? Closer to left upper corner = better Good classifiers have big area under the curve True positive rate False positive rate
79 > 0.9 = very good True positive rate AUC = False positive rate Area under the curve (AUC)
80 INTRODUCTION TO MACHINE LEARNING Let s practice!
An Introduction to ROC curves. Mark Whitehorn. Mark Whitehorn
An Introduction to ROC curves Mark Whitehorn Mark Whitehorn It s all about me Prof. Mark Whitehorn Emeritus Professor of Analytics Computing University of Dundee Consultant Writer (author) m.a.f.whitehorn@dundee.ac.uk
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More informationAn Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework
An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework Soumya GHOSE, Jhimli MITRA 1, Sankalp KHANNA 1 and Jason DOWLING 1 1. The Australian e-health and
More informationWeek 2 Video 3. Diagnostic Metrics
Week 2 Video 3 Diagnostic Metrics Different Methods, Different Measures Today we ll continue our focus on classifiers Later this week we ll discuss regressors And other methods will get worked in later
More information4. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2017 4. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More informationAssessment of diabetic retinopathy risk with random forests
Assessment of diabetic retinopathy risk with random forests Silvia Sanromà, Antonio Moreno, Aida Valls, Pedro Romero2, Sofia de la Riva2 and Ramon Sagarra2* Departament d Enginyeria Informàtica i Matemàtiques
More informationPredictive performance and discrimination in unbalanced classification
MASTER Predictive performance and discrimination in unbalanced classification van der Zon, S.B. Award date: 2016 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's),
More informationPredicting Breast Cancer Survivability Rates
Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer
More informationAn Empirical and Formal Analysis of Decision Trees for Ranking
An Empirical and Formal Analysis of Decision Trees for Ranking Eyke Hüllermeier Department of Mathematics and Computer Science Marburg University 35032 Marburg, Germany eyke@mathematik.uni-marburg.de Stijn
More informationDottorato di Ricerca in Statistica Biomedica. XXVIII Ciclo Settore scientifico disciplinare MED/01 A.A. 2014/2015
UNIVERSITA DEGLI STUDI DI MILANO Facoltà di Medicina e Chirurgia Dipartimento di Scienze Cliniche e di Comunità Sezione di Statistica Medica e Biometria "Giulio A. Maccacaro Dottorato di Ricerca in Statistica
More informationReview: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections
Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi
More informationLearning Decision Trees Using the Area Under the ROC Curve
Learning Decision rees Using the Area Under the ROC Curve Cèsar erri, Peter lach 2, José Hernández-Orallo Dep. de Sist. Informàtics i Computació, Universitat Politècnica de València, Spain 2 Department
More informationPerformance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool
Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool Sujata Joshi Assistant Professor, Dept. of CSE Nitte Meenakshi Institute of Technology Bangalore,
More informationCHAPTER 5 DECISION TREE APPROACH FOR BONE AGE ASSESSMENT
53 CHAPTER 5 DECISION TREE APPROACH FOR BONE AGE ASSESSMENT The decision tree approach for BAA makes use of the radius and ulna wrist bones to estimate the bone age. From the radius and ulna bones, 11
More informationKnowledge Discovery and Data Mining. Testing. Performance Measures. Notes. Lecture 15 - ROC, AUC & Lift. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC
More informationAnalysis of Diabetic Dataset and Developing Prediction Model by using Hive and R
Indian Journal of Science and Technology, Vol 9(47), DOI: 10.17485/ijst/2016/v9i47/106496, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Analysis of Diabetic Dataset and Developing Prediction
More informationPredictive Models for Healthcare Analytics
Predictive Models for Healthcare Analytics A Case on Retrospective Clinical Study Mengling Mornin Feng mfeng@mit.edu mornin@gmail.com 1 Learning Objectives After the lecture, students should be able to:
More informationReview. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN
Outline 1. Review sensitivity and specificity 2. Define an ROC curve 3. Define AUC 4. Non-parametric tests for whether or not the test is informative 5. Introduce the binormal ROC model 6. Discuss non-parametric
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2013/01/08 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More information3. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More informationBMI 541/699 Lecture 16
BMI 541/699 Lecture 16 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Proportions & contingency tables -
More informationMachine learning II. Juhan Ernits ITI8600
Machine learning II Juhan Ernits ITI8600 Hand written digit recognition 64 Example 2: Face recogition Classification, regression or unsupervised? How many classes? Example 2: Face recognition Classification,
More informationMETHODS FOR DETECTING CERVICAL CANCER
Chapter III METHODS FOR DETECTING CERVICAL CANCER 3.1 INTRODUCTION The successful detection of cervical cancer in a variety of tissues has been reported by many researchers and baseline figures for the
More informationA Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U
A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U T H E U N I V E R S I T Y O F T E X A S A T D A L L A S H U M A N L A N
More informationEfficient AUC Optimization for Information Ranking Applications
Efficient AUC Optimization for Information Ranking Applications Sean J. Welleck IBM, USA swelleck@us.ibm.com Abstract. Adequate evaluation of an information retrieval system to estimate future performance
More informationProbability Revision. MED INF 406 Assignment 5. Golkonda, Jyothi 11/4/2012
Probability Revision MED INF 406 Assignment 5 Golkonda, Jyothi 11/4/2012 Problem Statement Assume that the incidence for Lyme disease in the state of Connecticut is 78 cases per 100,000. A diagnostic test
More informationComparing Two ROC Curves Independent Groups Design
Chapter 548 Comparing Two ROC Curves Independent Groups Design Introduction This procedure is used to compare two ROC curves generated from data from two independent groups. In addition to producing a
More informationSensitivity, Specificity, and Relatives
Sensitivity, Specificity, and Relatives Brani Vidakovic ISyE 6421/ BMED 6700 Vidakovic, B. Se Sp and Relatives January 17, 2017 1 / 26 Overview Today: Vidakovic, B. Se Sp and Relatives January 17, 2017
More informationPrediction Models of Diabetes Diseases Based on Heterogeneous Multiple Classifiers
Int. J. Advance Soft Compu. Appl, Vol. 10, No. 2, July 2018 ISSN 2074-8523 Prediction Models of Diabetes Diseases Based on Heterogeneous Multiple Classifiers I Gede Agus Suwartane 1, Mohammad Syafrullah
More informationVU Biostatistics and Experimental Design PLA.216
VU Biostatistics and Experimental Design PLA.216 Julia Feichtinger Postdoctoral Researcher Institute of Computational Biotechnology Graz University of Technology Outline for Today About this course Background
More informationPersonalized Colorectal Cancer Survivability Prediction with Machine Learning Methods*
Personalized Colorectal Cancer Survivability Prediction with Machine Learning Methods* 1 st Samuel Li Princeton University Princeton, NJ seli@princeton.edu 2 nd Talayeh Razzaghi New Mexico State University
More informationClassification of benign and malignant masses in breast mammograms
Classification of benign and malignant masses in breast mammograms A. Šerifović-Trbalić*, A. Trbalić**, D. Demirović*, N. Prljača* and P.C. Cattin*** * Faculty of Electrical Engineering, University of
More informationZheng Yao Sr. Statistical Programmer
ROC CURVE ANALYSIS USING SAS Zheng Yao Sr. Statistical Programmer Outline Background Examples: Accuracy assessment Compare ROC curves Cut-off point selection Summary 2 Outline Background Examples: Accuracy
More informationPredicting Breast Cancer Survival Using Treatment and Patient Factors
Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women
More informationPERFORMANCE MEASURES
PERFORMANCE MEASURES Of predictive systems DATA TYPES Binary Data point Value A FALSE B TRUE C TRUE D FALSE E FALSE F TRUE G FALSE Real Value Data Point Value a 32.3 b.2 b 2. d. e 33 f.65 g 72.8 ACCURACY
More informationPerformance Evaluation of Machine Learning Algorithms in the Classification of Parkinson Disease Using Voice Attributes
Performance Evaluation of Machine Learning Algorithms in the Classification of Parkinson Disease Using Voice Attributes J. Sujatha Research Scholar, Vels University, Assistant Professor, Post Graduate
More informationData that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes:
This sheets starts from slide #83 to the end ofslide #4. If u read this sheet you don`t have to return back to the slides at all, they are included here. Categorical Data (Qualitative data): Data that
More informationPrediction of micrornas and their targets
Prediction of micrornas and their targets Introduction Brief history mirna Biogenesis Computational Methods Mature and precursor mirna prediction mirna target gene prediction Summary micrornas? RNA can
More informationWhite Paper Estimating Complex Phenotype Prevalence Using Predictive Models
White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015
More informationSTATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012
STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION by XIN SUN PhD, Kansas State University, 2012 A THESIS Submitted in partial fulfillment of the requirements
More informationIntroduction to Discrimination in Microarray Data Analysis
Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t
More informationMachine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017
Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 A.K.A. Artificial Intelligence Unsupervised learning! Cluster analysis Patterns, Clumps, and Joining
More informationModifying ROC Curves to Incorporate Predicted Probabilities
Modifying ROC Curves to Incorporate Predicted Probabilities C. Ferri, P. Flach 2, J. Hernández-Orallo, A. Senad Departament de Sistemes Informàtics i Computació Universitat Politècnica de València Spain
More informationClassification of EEG signals in an Object Recognition task
Classification of EEG signals in an Object Recognition task Iacob D. Rus, Paul Marc, Mihaela Dinsoreanu, Rodica Potolea Technical University of Cluj-Napoca Cluj-Napoca, Romania 1 rus_iacob23@yahoo.com,
More informationEnhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation
Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation L Uma Maheshwari Department of ECE, Stanley College of Engineering and Technology for Women, Hyderabad - 500001, India. Udayini
More informationUnit 7 Comparisons and Relationships
Unit 7 Comparisons and Relationships Objectives: To understand the distinction between making a comparison and describing a relationship To select appropriate graphical displays for making comparisons
More informationApplying Data Mining for Epileptic Seizure Detection
Applying Data Mining for Epileptic Seizure Detection Ying-Fang Lai 1 and Hsiu-Sen Chiang 2* 1 Department of Industrial Education, National Taiwan Normal University 162, Heping East Road Sec 1, Taipei,
More informationVarious performance measures in Binary classification An Overview of ROC study
Various performance measures in Binary classification An Overview of ROC study Suresh Babu. Nellore Department of Statistics, S.V. University, Tirupati, India E-mail: sureshbabu.nellore@gmail.com Abstract
More informationThe area under the ROC curve as a variable selection criterion for multiclass classification problems
The area under the ROC curve as a variable selection criterion for multiclass classification problems M. de Figueiredo 1,2 C. B. Y. Cordella 3 D. Jouan-Rimbaud Bouveresse 1,3 X. Archer 2 J.-M. Bégué 2
More informationImproving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification
Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification Yuxuan Li and Xiuzhen Zhang School of Computer Science and Information Technology, RMIT University, Melbourne, Australia
More informationDiagnostic tests, Laboratory tests
Diagnostic tests, Laboratory tests I. Introduction II. III. IV. Informational values of a test Consequences of the prevalence rate Sequential use of 2 tests V. Selection of a threshold: the ROC curve VI.
More informationPREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH
PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH 1 VALLURI RISHIKA, M.TECH COMPUTER SCENCE AND SYSTEMS ENGINEERING, ANDHRA UNIVERSITY 2 A. MARY SOWJANYA, Assistant Professor COMPUTER SCENCE
More informationCOMP90049 Knowledge Technologies
COMP90049 Knowledge Technologies Introduction Classification (Lecture Set4) 2017 Rao Kotagiri School of Computing and Information Systems The Melbourne School of Engineering Some of slides are derived
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector
More informationAutomatic Detection of Epileptic Seizures in EEG Using Machine Learning Methods
Automatic Detection of Epileptic Seizures in EEG Using Machine Learning Methods Ying-Fang Lai 1 and Hsiu-Sen Chiang 2* 1 Department of Industrial Education, National Taiwan Normal University 162, Heping
More informationAnalysis of Classification Algorithms towards Breast Tissue Data Set
Analysis of Classification Algorithms towards Breast Tissue Data Set I. Ravi Assistant Professor, Department of Computer Science, K.R. College of Arts and Science, Kovilpatti, Tamilnadu, India Abstract
More informationExample - Birdkeeping and Lung Cancer - Interpretation. Lecture 20 - Sensitivity, Specificity, and Decisions. What do the numbers not mean...
Odds Ratios Example - Birdkeeping and Lung Cancer - Interpretation Lecture 20 - Sensitivity, Specificity, and Decisions Sta102 / BME102 Colin Rundel April 16, 2014 Estimate Std. Error z value Pr(> z )
More informationImproved Intelligent Classification Technique Based On Support Vector Machines
Improved Intelligent Classification Technique Based On Support Vector Machines V.Vani Asst.Professor,Department of Computer Science,JJ College of Arts and Science,Pudukkottai. Abstract:An abnormal growth
More informationAN INVESTIGATION OF MULTI-LABEL CLASSIFICATION TECHNIQUES FOR PREDICTING HIV DRUG RESISTANCE IN RESOURCE-LIMITED SETTINGS
AN INVESTIGATION OF MULTI-LABEL CLASSIFICATION TECHNIQUES FOR PREDICTING HIV DRUG RESISTANCE IN RESOURCE-LIMITED SETTINGS by Pascal Brandt Submitted in fulfilment of the academic requirements for the degree
More informationClassification of Thyroid Nodules in Ultrasound Images using knn and Decision Tree
Classification of Thyroid Nodules in Ultrasound Images using knn and Decision Tree Gayana H B 1, Nanda S 2 1 IV Sem, M.Tech, Biomedical Signal processing & Instrumentation, SJCE, Mysuru, Karnataka, India
More informationClassification with microarray data
Classification with microarray data Aron Charles Eklund eklund@cbs.dtu.dk DNA Microarray Analysis - #27612 January 8, 2010 The rest of today Now: What is classification, and why do we do it? How to develop
More informationAUTOMATING NEUROLOGICAL DISEASE DIAGNOSIS USING STRUCTURAL MR BRAIN SCAN FEATURES
AUTOMATING NEUROLOGICAL DISEASE DIAGNOSIS USING STRUCTURAL MR BRAIN SCAN FEATURES ALLAN RAVENTÓS AND MOOSA ZAIDI Stanford University I. INTRODUCTION Nine percent of those aged 65 or older and about one
More informationComparison of discrimination methods for the classification of tumors using gene expression data
Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley
More informationPredictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts
Predictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts Kinar, Y., Kalkstein, N., Akiva, P., Levin, B., Half, E.E., Goldshtein, I., Chodick, G. and Shalev,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and
More informationOCW Epidemiology and Biostatistics, 2010 Michael D. Kneeland, MD November 18, 2010 SCREENING. Learning Objectives for this session:
OCW Epidemiology and Biostatistics, 2010 Michael D. Kneeland, MD November 18, 2010 SCREENING Learning Objectives for this session: 1) Know the objectives of a screening program 2) Define and calculate
More informationBehavioral Data Mining. Lecture 4 Measurement
Behavioral Data Mining Lecture 4 Measurement Outline Hypothesis testing Parametric statistical tests Non-parametric tests Precision-Recall plots ROC plots Hardware update Icluster machines are ready for
More informationApplying One-vs-One and One-vs-All Classifiers in k-nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem
Oral Presentation at MIE 2011 30th August 2011 Oslo Applying One-vs-One and One-vs-All Classifiers in k-nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem Kirsi
More informationClassification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang
Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression
More informationDecision Support System for Heart Disease Diagnosing Using K-NN Algorithm
Decision Support System for Heart Disease Diagnosing Using K-NN Algorithm Tito Yuwono Department of Electrical Engineering Islamic University of Indonesia Yogyakarta Address: Kaliurang Street KM 14 Yogyakarta,
More informationDetection of Glaucoma and Diabetic Retinopathy from Fundus Images by Bloodvessel Segmentation
International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 8958, Volume-5, Issue-5, June 2016 Detection of Glaucoma and Diabetic Retinopathy from Fundus Images by Bloodvessel Segmentation
More informationThe Application of Image Processing Techniques for Detection and Classification of Cancerous Tissue in Digital Mammograms
The Application of Image Processing Techniques for Detection and Classification of Cancerous Tissue in Digital Mammograms Angayarkanni.N 1, Kumar.D 2 and Arunachalam.G 3 1 Research Scholar Department of
More informationBreast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information Abeer Alzubaidi abeer.alzubaidi022014@my.ntu.ac.uk David Brown david.brown@ntu.ac.uk Abstract
More informationSupporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction
Supporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction Amir Barati Farimani, Mohammad Heiranian, Narayana R. Aluru 1 Department
More informationPredicting Microfinance Participation in Indian Villages
Predicting Microfinance Participation in Indian Villages Govind Manian and Karen Shen December 5, 202 Abstract Using data from a microfinance organization operating in southern Indian villages, we use
More informationMEM BASED BRAIN IMAGE SEGMENTATION AND CLASSIFICATION USING SVM
MEM BASED BRAIN IMAGE SEGMENTATION AND CLASSIFICATION USING SVM T. Deepa 1, R. Muthalagu 1 and K. Chitra 2 1 Department of Electronics and Communication Engineering, Prathyusha Institute of Technology
More information5.3: Associations in Categorical Variables
5.3: Associations in Categorical Variables Now we will consider how to use probability to determine if two categorical variables are associated. Conditional Probabilities Consider the next example, where
More informationBrain Tumor Segmentation Based On a Various Classification Algorithm
Brain Tumor Segmentation Based On a Various Classification Algorithm A.Udhaya Kunam Research Scholar, PG & Research Department of Computer Science, Raja Dooraisingam Govt. Arts College, Sivagangai, TamilNadu,
More informationABSTRACT. classification algorithm, so rules can be extracted from the decision tree models.
ABSTRACT This study analyses the rule extraction of the cardiovascular database using classification algorithm. Heart disease is one of the most common causes of death in the western world and increasing
More informationAutomatic Lung Cancer Detection Using Volumetric CT Imaging Features
Automatic Lung Cancer Detection Using Volumetric CT Imaging Features A Research Project Report Submitted To Computer Science Department Brown University By Dronika Solanki (B01159827) Abstract Lung cancer
More informationKnock Knock, Who s There? Membership Inference on Aggregate Location Data
Knock Knock, Who s There? Membership Inference on Aggregate Location Data Apostolos Pyrgelis University College London apostolos.pyrgelis.14@ucl.ac.uk Carmela Troncoso IMDEA Software Institute carmela.troncoso@imdea.org
More informationWhen Overlapping Unexpectedly Alters the Class Imbalance Effects
When Overlapping Unexpectedly Alters the Class Imbalance Effects V. García 1,2, R.A. Mollineda 2,J.S.Sánchez 2,R.Alejo 1,2, and J.M. Sotoca 2 1 Lab. Reconocimiento de Patrones, Instituto Tecnológico de
More informationReceiver operating characteristic
Receiver operating characteristic From Wikipedia, the free encyclopedia In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot of the sensitivity,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationProtein Structure & Function. University, Indianapolis, USA 3 Department of Molecular Medicine, University of South Florida, Tampa, USA
Protein Structure & Function Supplement for article entitled MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in
More informationFeature selection methods for early predictive biomarker discovery using untargeted metabolomic data
Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data Dhouha Grissa, Mélanie Pétéra, Marion Brandolini, Amedeo Napoli, Blandine Comte and Estelle Pujos-Guillot
More informationData Mining of Access to Tetanus Toxoid Immunization Among Women of Childbearing Age in Ethiopia
Machine Learning Research 2017; 2(2): 54-60 http://www.sciencepublishinggroup.com/j/mlr doi: 10.11648/j.mlr.20170202.12 Data Mining of Access to Tetanus Toxoid Immunization Among Women of Childbearing
More informationDerivative-Free Optimization for Hyper-Parameter Tuning in Machine Learning Problems
Derivative-Free Optimization for Hyper-Parameter Tuning in Machine Learning Problems Hiva Ghanbari Jointed work with Prof. Katya Scheinberg Industrial and Systems Engineering Department Lehigh University
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 9, September 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationPropensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER)
Propensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER) Yan Wu Advisor: Robert Pruzek Epidemiology and Biostatistics
More informationUsing AUC and Accuracy in Evaluating Learning Algorithms
1 Using AUC and Accuracy in Evaluating Learning Algorithms Jin Huang Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 fjhuang, clingg@csd.uwo.ca
More informationPII: S (96) THE USE OF THE AREA UNDER THE ROC CURVE IN THE EVALUATION OF MACHINE LEARNING ALGORITHMS
Pergamon Pattern Recognition, Vol. 30, No. 7, pp. 1145-1159, 1997 1997 Pattern Recognition Society. Published by Elsevier Science Ltd Printed in Great Britain. All rights reserved 0031-3203/97 $17.00+.00
More informationObjectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests
Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing
More informationD8 - Executive Summary
Autonomous Medical Monitoring and Diagnostics AMIGO DOCUMENT N : ISSUE : 1.0 DATE : 01.09.2016 - CSEM PROJECT N : 221-ES.1577 CONTRACT N : 4000113764 /15/F/MOS FUNCTION NAME SIGNATURE DATE Author Expert
More informationThe Impact of Using Regression Models to Build Defect Classifiers
The Impact of Using Regression Models to Build Defect Classifiers Gopi Krishnan Rajbahadur, Shaowei Wang Queen s University, Canada {krishnan, shaowei}@cs.queensu.ca Yasutaka Kamei Kyushu University, Japan
More informationStatistics, Probability and Diagnostic Medicine
Statistics, Probability and Diagnostic Medicine Jennifer Le-Rademacher, PhD Sponsored by the Clinical and Translational Science Institute (CTSI) and the Department of Population Health / Division of Biostatistics
More informationA PRACTICAL APPROACH FOR IMPLEMENTING THE PROBABILITY OF LIQUEFACTION IN PERFORMANCE BASED DESIGN
A PRACTICAL APPROACH FOR IMPLEMENTING THE PROBABILITY OF LIQUEFACTION IN PERFORMANCE BASED DESIGN Thomas Oommen, Ph.D. Candidate, Department of Civil and Environmental Engineering, Tufts University, 113
More informationComparing Multifunctionality and Association Information when Classifying Oncogenes and Tumor Suppressor Genes
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationDetection of Lung Cancer Using Marker-Controlled Watershed Transform
2015 International Conference on Pervasive Computing (ICPC) Detection of Lung Cancer Using Marker-Controlled Watershed Transform Sayali Satish Kanitkar M.E. (Signal Processing), Sinhgad College of engineering,
More informationOCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010
OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010 SAMPLING AND CONFIDENCE INTERVALS Learning objectives for this session:
More information