Data Mining in Bioinformatics Day 4: Text Mining
|
|
- Elwin McDaniel
- 5 years ago
- Views:
Transcription
1 Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1
2 What is text mining? Definition Text mining is the use of automated methods for exploiting the enormous amount of knowledge available in the (biomedical) literature. Motivation Most knowledge is stored in terms of texts, both in industry and in academia This alone makes text mining an integral part of knowledge discovery! Furthermore, to make text machine-readable, one has to solve several recognition (mining) tasks on text Karsten Borgwardt: Data Mining in Bioinformatics, Page 2
3 What is text mining? Common tasks Information retrieval: Find documents that are relevant to a user, or to a query in a collection of documents Document ranking: rank all documents in the collection Document selection: classify documents into relevant and irrelevant Information filtering: Search newly created documents for information that is relevant to a user Document classification: Assign a document to a category that describes its content Keyword co-occurrence: Find groups of keywords that co-occur in many documents Karsten Borgwardt: Data Mining in Bioinformatics, Page 3
4 Evaluating text mining Precision and Recall Let the set of documents that are relevant to a query be denoted as {Relevant} and the set of retrieved documents as {Retrieved}. The precision is the percentage of retrieved documents that are relevant to the query precision = {Relevant} {Retrieved} {Retrieved} The recall is the percentage of relevant documents that were retrieved by the query: recall = {Relevant} {Retrieved} {Relevant} (1) (2) Karsten Borgwardt: Data Mining in Bioinformatics, Page 4
5 Text representation Tokenization Process of identifying keywords in a document Not all words in a text are relevant Text mining ignores stop words Stop words form the stop list Stop lists are context-dependent Karsten Borgwardt: Data Mining in Bioinformatics, Page 5
6 Text representation Vector space model Given #d documents and #t terms Model each document as a vector v in a t-dimensional space Weighted Term-frequency matrix Matrix T F of size #d #t Entries measure association of term and document If a term t does not occur in a document d, then T F (d, t) = 0 If a term t does occur in a document d, then T F (d, t) > 0. Karsten Borgwardt: Data Mining in Bioinformatics, Page 6
7 Text representation If term t occurs in document d, then T F (d, t) = 1 T F (d, t) = frequency of t in d (freq(d, t)) freq(d,t) T F (d, t) = t T freq(d,t ) T F (d, t) = 1 + log(1 + log(freq(d, t))) Karsten Borgwardt: Data Mining in Bioinformatics, Page 7
8 Text representation Inverse document frequency represents the scaling factor, or importance, of a term A term that appears in many document is scaled down IDF (t) = log 1 + d d t where d is the number of all documents, and d t is the number of documents containing term t TF-IDF measure Product of term frequency and inverse document frequency: (3) T F -IDF (d, t) = T F (d, t)idf (t); (4) Karsten Borgwardt: Data Mining in Bioinformatics, Page 8
9 Measuring similarity Cosine measure Kernels Let v 1 and v 2 be two document vectors. The cosine similarity is defined as sim(v 1, v 2 ) = v 1 v 2 v 1 v 2 depending on how we represent a document, there are many kernels available for measuring similarity of these representations vectorial representation: vector kernels like linear, polynomial, Gaussian RBF kernel one long string: string kernels that count common k- mers in two strings (more on that later in the course) (5) Karsten Borgwardt: Data Mining in Bioinformatics, Page 9
10 Keyword co-occurrence Problem Find sets of keyword that often co-occur Common problem in biomedical literature: find associations between genes, proteins or other entities using co-occurrence search Keyword co-occurrence search is an instance of a more general problem in data mining, called association rule mining. Karsten Borgwardt: Data Mining in Bioinformatics, Page 10
11 Association rules Definitions Let I = {I 1, I 2,..., I m } be a set of items (keywords) Let D be the database of transactions T (collection of documents) A transaction T D is a set of items: T I (a document is a set of keywords) Let A be a set of items: A T. An association rule is an implication of the form where A, B I and A B = A T B T, (6) Karsten Borgwardt: Data Mining in Bioinformatics, Page 11
12 Association rules Support and Confidence The rule A B holds in the transaction set D with support s, where s is the percentage of transactions in D that contain A B: {T D A T B T } support(a B) = (7) {T D} The rule A B has confidence c in the transaction set D, where c is the percentage of transactions in D containing A that also contain B: confidence(a B) = {T D A T B T } {T D A T } (8) Karsten Borgwardt: Data Mining in Bioinformatics, Page 12
13 Association rules Strong rules Rules that satisfy both a minimum support threshold (minsup) and a minimum confidence threshold (minconf) are called strong association rules and these are the ones we are after! Finding strong rules 1. Search for all frequent itemsets (set of items that occur in at least minsup % of all transactions) 2. Generate strong association rules from the frequent itemsets Karsten Borgwardt: Data Mining in Bioinformatics, Page 13
14 Association rules Apriori algorithm Steps Makes use of the Apriori property: If an itemset A is frequent, then any subset B of A (B A) is frequent as well. If B is infrequent, then any superset A of B (A B) is infrequent as well. 1. Determine frequent items = k-itemsets with k = 1 2. Join all pairs of frequent k-itemsets that differ in at most 1 item = candidates C k+1 for being frequent k+1 itemsets 3. Check the frequency of these candidates C k+1 : the frequent ones form the frequent k + 1-itemsets (trick: discard any candidate immediately that contains an infrequent k-itemset) 4. Repeat from Step 2 until no more candidate is frequent Karsten Borgwardt: Data Mining in Bioinformatics, Page 14
15 Transduction Known test set Classification on text databases often means that we know all the data we will work with before training Hence the test set is known apriori This setting is called transductive Can we define classifiers that exploit the known test set? Yes! Transductive SVM (Joachims, ICML 1999) Trains SVM on both training and test set Uses test data to maximise margin Karsten Borgwardt: Data Mining in Bioinformatics, Page 15
16 Inductive vs. transductive Classification Task: predict label y from features x Classic inductive setting Strategy: Learn classifier on (labelled) training data Goal: Classifier shall generalise to unseen data from same distribution Transductive setting Strategy: Learn classifier on (labelled) training data AND a given (unlabelled) test dataset Goal: Predict class labels for this particular dataset Karsten Borgwardt: Data Mining in Bioinformatics, Page 16
17 Why transduction? Really necessary? Classic approach works: train on training dataset, test on test dataset That is what we usually do in practice, for instance, in cross-validation. We usually ignore or neglect that the fact that settings are transductive. The benefits of transductive classification Inductive setting: infinitely many potential classifiers Transductive setting: finite number of equivalence classes of classifiers f and f in same equivalence class f and f classify points from training and test dataset identically Karsten Borgwardt: Data Mining in Bioinformatics, Page 17
18 Why transduction? Idea of Transductive SVMs Risk on Test data Risk on Training data + confidence interval (depends on number of equivalence classes) Theorem by Vapnik(1998): The larger the margin, the lower the number of equivalence classes that contain a classifier with this margin Find hyperplane that separates classes in training data AND in test data with maximum margin. Karsten Borgwardt: Data Mining in Bioinformatics, Page 18
19 Why transduction? Karsten Borgwardt: Data Mining in Bioinformatics, Page 19
20 Transduction on text Karsten Borgwardt: Data Mining in Bioinformatics, Page 20
21 Transductive SVM Linearly separable case 1 min w,b,y 2 w 2 s.t. n i=1 y i [w x i + b] 1 k j=1 y j [w x j + b] 1 Karsten Borgwardt: Data Mining in Bioinformatics, Page 21
22 Transductive SVM Non-linearly separable case 1 min w,b,y,ξ,ξ 2 w 2 + C n ξ i + C i=0 k j=0 ξ j s.t. n i=1 y i [w x i + b] 1 ξ i k j=1 y j [w x j + b] 1 ξ j n i=1 ξ i 0 k j=1 ξ j 0 Karsten Borgwardt: Data Mining in Bioinformatics, Page 22
23 Transductive SVM Optimisation How to solve this OP? Not so nice: combination of integer and convex OP Joachims approach: find approximate solution by iterative application of inductive SVM train inductive SVM on training data, predict on test data, assign labels to test data retrain on all data, with special slack weights for test data (C, C + ) Outer loop: repeat and slowly increase (C, C + ) Inner loop: within each repetition switch pairs of misclassified data points repeatedly Local search with approximate solution to OP Karsten Borgwardt: Data Mining in Bioinformatics, Page 23
24 Inductive SVM for TSVM Variant of inductive SVM 1 min w,b,y,ξ,ξ 2 w 2 + C n k ξ i + C ξj + C+ j:yj = 1 i=0 k j:y j =1 ξ j s.t. n i=1 y i [w x i + b] 1 ξ i k j=1 yj [w x j + b] 1 ξj Three different penalty costs C for points from training dataset C for points from in test dataset currently in class 1 for points from in test dataset currently in class +1 C + Karsten Borgwardt: Data Mining in Bioinformatics, Page 24
25 Experiments Average P/R-breakeven point on the Reuters dataset for different training set sizes and a test size of 3,299 Karsten Borgwardt: Data Mining in Bioinformatics, Page 25
26 Experiments Average P/R-breakeven point on the Reuters dataset for 17 training documents and varying test set size for the TSVM Karsten Borgwardt: Data Mining in Bioinformatics, Page 26
27 Experiments Average P/R-breakeven point on the WebKB category course for different training set sizes Karsten Borgwardt: Data Mining in Bioinformatics, Page 27
28 Experiments Average P/R-breakeven point on the WebKB category project for different training set sizes Karsten Borgwardt: Data Mining in Bioinformatics, Page 28
29 Summary Results Transductive version of SVM Maximizes margin on training and test data Implementation uses variant of classic inductive SVM Solution is approximate and fast Works well on text, in particular on small training samples and large test sets Karsten Borgwardt: Data Mining in Bioinformatics, Page 29
30 References and further reading References [1] T.-Joachims. Transductive Inference for Text Classification using Support Vector Machines ICML, 1999: [2] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Elsevier, Morgan-Kaufmann Publishers, Karsten Borgwardt: Data Mining in Bioinformatics, Page 30
31 The end See you tomorrow! Next topic: Graph Mining Karsten Borgwardt: Data Mining in Bioinformatics, Page 31
Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics
Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 21 to March 4, 2011 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt:
More informationEffective Diagnosis of Alzheimer s Disease by means of Association Rules
Effective Diagnosis of Alzheimer s Disease by means of Association Rules Rosa Chaves (rosach@ugr.es) J. Ramírez, J.M. Górriz, M. López, D. Salas-Gonzalez, I. Illán, F. Segovia, P. Padilla Dpt. Theory of
More informationData Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics
Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics Karsten Borgwardt March 1 to March 12, 2010 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt:
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationMining Low-Support Discriminative Patterns from Dense and High-Dimensional Data. Technical Report
Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street
More informationThis is the accepted version of this article. To be published as : This is the author version published as:
QUT Digital Repository: http://eprints.qut.edu.au/ This is the author version published as: This is the accepted version of this article. To be published as : This is the author version published as: Chew,
More informationVariable Features Selection for Classification of Medical Data using SVM
Variable Features Selection for Classification of Medical Data using SVM Monika Lamba USICT, GGSIPU, Delhi, India ABSTRACT: The parameters selection in support vector machines (SVM), with regards to accuracy
More informationProposing a New Term Weighting Scheme for Text Categorization
Proposing a New Term Weighting Scheme for Text Categorization Man LAN Institute for Infocomm Research 21 Heng Mui Keng Terrace Singapore 119613 lanman@i2r.a-star.edu.sg Chew-Lim TAN School of Computing
More informationThe Long Tail of Recommender Systems and How to Leverage It
The Long Tail of Recommender Systems and How to Leverage It Yoon-Joo Park Stern School of Business, New York University ypark@stern.nyu.edu Alexander Tuzhilin Stern School of Business, New York University
More informationNMF-Density: NMF-Based Breast Density Classifier
NMF-Density: NMF-Based Breast Density Classifier Lahouari Ghouti and Abdullah H. Owaidh King Fahd University of Petroleum and Minerals - Department of Information and Computer Science. KFUPM Box 1128.
More informationA Comparison of Collaborative Filtering Methods for Medication Reconciliation
A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,
More informationAdministrative notes. Computational Thinking ct.cs.ubc.ca
Administrative notes March 14: Midterm 2: this will cover all lectures, labs and readings between Tue Jan 31 and Thu Mar 9 inclusive Practice Midterm 2 is on Exercises webpage: http://www.ugrad.cs.ubc.ca/~cs100/2016w2/
More informationTHE data used in this project is provided. SEIZURE forecasting systems hold promise. Seizure Prediction from Intracranial EEG Recordings
1 Seizure Prediction from Intracranial EEG Recordings Alex Fu, Spencer Gibbs, and Yuqi Liu 1 INTRODUCTION SEIZURE forecasting systems hold promise for improving the quality of life for patients with epilepsy.
More informationPrediction of Alternative Splice Sites in Human Genes
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2007 Prediction of Alternative Splice Sites in Human Genes Douglas Simmons San Jose State University
More informationEvaluating Classifiers for Disease Gene Discovery
Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics
More informationAutomatic Medical Coding of Patient Records via Weighted Ridge Regression
Sixth International Conference on Machine Learning and Applications Automatic Medical Coding of Patient Records via Weighted Ridge Regression Jian-WuXu,ShipengYu,JinboBi,LucianVladLita,RaduStefanNiculescuandR.BharatRao
More informationEfficient AUC Optimization for Information Ranking Applications
Efficient AUC Optimization for Information Ranking Applications Sean J. Welleck IBM, USA swelleck@us.ibm.com Abstract. Adequate evaluation of an information retrieval system to estimate future performance
More informationDesign of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine
Design of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine Akshay Joshi Anum Khan Omkar Kulkarni Department of Computer Engineering Department of Computer Engineering
More informationLecture 2: Foundations of Concept Learning
Lecture 2: Foundations of Concept Learning Cognitive Systems - Machine Learning Part I: Basic Approaches to Concept Learning Version Space, Candidate Elimination, Inductive Bias last change October 18,
More informationDrug clearance pathway prediction using semi-supervised learning
1,a) 1 1,2 802 Drug clearance pathway prediction using semi-supervised learning Yanagisawa Keisuke 1,a) Ishida Takashi 1 Akiyama Yutaka 1,2 Abstract: Nowadays, drug development requires too much time and
More informationA MODIFIED FREQUENCY BASED TERM WEIGHTING APPROACH FOR INFORMATION RETRIEVAL
Int. J. Chem. Sci.: 14(1), 2016, 449-457 ISSN 0972-768X www.sadgurupublications.com A MODIFIED FREQUENCY BASED TERM WEIGHTING APPROACH FOR INFORMATION RETRIEVAL M. SANTHANAKUMAR a* and C. CHRISTOPHER COLUMBUS
More informationGenetic Algorithm based Feature Extraction for ECG Signal Classification using Neural Network
Genetic Algorithm based Feature Extraction for ECG Signal Classification using Neural Network 1 R. Sathya, 2 K. Akilandeswari 1,2 Research Scholar 1 Department of Computer Science 1 Govt. Arts College,
More informationComputational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project
Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene
More informationDevelopment of Soft-Computing techniques capable of diagnosing Alzheimer s Disease in its pre-clinical stage combining MRI and FDG-PET images.
Development of Soft-Computing techniques capable of diagnosing Alzheimer s Disease in its pre-clinical stage combining MRI and FDG-PET images. Olga Valenzuela, Francisco Ortuño, Belen San-Roman, Victor
More informationWinner s Report: KDD CUP Breast Cancer Identification
Winner s Report: KDD CUP Breast Cancer Identification ABSTRACT Claudia Perlich, Prem Melville, Yan Liu, Grzegorz Świrszcz, Richard Lawrence IBM T.J. Watson Research Center Yorktown Heights, NY 10598 {perlich,pmelvil,liuya}@us.ibm.com
More informationImproved Intelligent Classification Technique Based On Support Vector Machines
Improved Intelligent Classification Technique Based On Support Vector Machines V.Vani Asst.Professor,Department of Computer Science,JJ College of Arts and Science,Pudukkottai. Abstract:An abnormal growth
More informationReview: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections
Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi
More informationSemi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts
jsci2016 Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Wutthipong Kongburan, Praisan Padungweang, Worarat Krathu, Jonathan H. Chan School of Information Technology King
More informationMTAT Bayesian Networks. Introductory Lecture. Sven Laur University of Tartu
MTAT.05.113 Bayesian Networks Introductory Lecture Sven Laur University of Tartu Motivation Probability calculus can be viewed as an extension of classical logic. We use many imprecise and heuristic rules
More informationDetecting and monitoring foodborne illness outbreaks: Twitter communications and the 2015 U.S. Salmonella outbreak linked to imported cucumbers
Detecting and monitoring foodborne illness outbreaks: Twitter communications and the 2015 U.S. Salmonella outbreak linked to imported cucumbers Abstract This research uses Twitter, as a social media device,
More informationRemarks on Bayesian Control Charts
Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir
More informationAn Improved Algorithm To Predict Recurrence Of Breast Cancer
An Improved Algorithm To Predict Recurrence Of Breast Cancer Umang Agrawal 1, Ass. Prof. Ishan K Rajani 2 1 M.E Computer Engineer, Silver Oak College of Engineering & Technology, Gujarat, India. 2 Assistant
More informationCLASSIFICATION OF BREAST CANCER INTO BENIGN AND MALIGNANT USING SUPPORT VECTOR MACHINES
CLASSIFICATION OF BREAST CANCER INTO BENIGN AND MALIGNANT USING SUPPORT VECTOR MACHINES K.S.NS. Gopala Krishna 1, B.L.S. Suraj 2, M. Trupthi 3 1,2 Student, 3 Assistant Professor, Department of Information
More informationPredictive performance and discrimination in unbalanced classification
MASTER Predictive performance and discrimination in unbalanced classification van der Zon, S.B. Award date: 2016 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's),
More informationPredicting Sleep Using Consumer Wearable Sensing Devices
Predicting Sleep Using Consumer Wearable Sensing Devices Miguel A. Garcia Department of Computer Science Stanford University Palo Alto, California miguel16@stanford.edu 1 Introduction In contrast to the
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017
RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science
More informationExploiting Similarity to Optimize Recommendations from User Feedback
1 Exploiting Similarity to Optimize Recommendations from User Feedback Hasta Vanchinathan Andreas Krause (Learning and Adaptive Systems Group, D-INF,ETHZ ) Collaborators: Isidor Nikolic (Microsoft, Zurich),
More informationSupervised Learning Approach for Predicting the Presence of Seizure in Human Brain
Supervised Learning Approach for Predicting the Presence of Seizure in Human Brain Sivagami P,Sujitha V M.Phil Research Scholar PSGR Krishnammal College for Women Coimbatore, India sivagamithiru@gmail.com,vsujitha1987@gmail.com
More informationClassification of Mammograms using Gray-level Co-occurrence Matrix and Support Vector Machine Classifier
Classification of Mammograms using Gray-level Co-occurrence Matrix and Support Vector Machine Classifier P.Samyuktha,Vasavi College of engineering,cse dept. D.Sriharsha, IDD, Comp. Sc. & Engg., IIT (BHU),
More informationSVM-Kmeans: Support Vector Machine based on Kmeans Clustering for Breast Cancer Diagnosis
SVM-Kmeans: Support Vector Machine based on Kmeans Clustering for Breast Cancer Diagnosis Walaa Gad Faculty of Computers and Information Sciences Ain Shams University Cairo, Egypt Email: walaagad [AT]
More informationDetection of Cochlear Hearing Loss Applying Wavelet Packets and Support Vector Machines
Detection of Cochlear Hearing Loss Applying Wavelet Packets and Support Vector Machines Hubert Dietl 1, Stephan Weiss 1 1 Dept. Electronics & Computer Science, University of Southampton, UK hwd,s.weiss@ecs.soton.ac.uk
More informationIntroduction to Computational Neuroscience
Introduction to Computational Neuroscience Lecture 5: Data analysis II Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis II 6 Single
More informationJournal of Advanced Scientific Research ROUGH SET APPROACH FOR FEATURE SELECTION AND GENERATION OF CLASSIFICATION RULES OF HYPOTHYROID DATA
Kavitha et al., J Adv Sci Res, 2016, 7(2): 15-19 15 Journal of Advanced Scientific Research Available online through http://www.sciensage.info/jasr ISSN 0976-9595 Research Article ROUGH SET APPROACH FOR
More informationPrediction of Successful Memory Encoding from fmri Data
Prediction of Successful Memory Encoding from fmri Data S.K. Balci 1, M.R. Sabuncu 1, J. Yoo 2, S.S. Ghosh 3, S. Whitfield-Gabrieli 2, J.D.E. Gabrieli 2 and P. Golland 1 1 CSAIL, MIT, Cambridge, MA, USA
More informationContents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network
Contents Classification Rule Generation for Bioinformatics Hyeoncheol Kim Rule Extraction from Neural Networks Algorithm Ex] Promoter Domain Hybrid Model of Knowledge and Learning Knowledge refinement
More informationUsing Association Rule Mining to Discover Temporal Relations of Daily Activities
Using Association Rule Mining to Discover Temporal Relations of Daily Activities Ehsan Nazerfard, Parisa Rashidi, and Diane J. Cook School of Electrical Engineering and Computer Science Washington State
More informationStatement of research interest
Statement of research interest Milos Hauskrecht My primary field of research interest is Artificial Intelligence (AI). Within AI, I am interested in problems related to probabilistic modeling, machine
More informationData mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis
Data mining for Obstructive Sleep Apnea Detection 18 October 2017 Konstantinos Nikolaidis Introduction: What is Obstructive Sleep Apnea? Obstructive Sleep Apnea (OSA) is a relatively common sleep disorder
More informationElectrocardiogram beat classification using Discrete Wavelet Transform, higher order statistics and multivariate analysis
Electrocardiogram beat classification using Discrete Wavelet Transform, higher order statistics and multivariate analysis Thripurna Thatipelli 1, Padmavathi Kora 2 1Assistant Professor, Department of ECE,
More informationUsing Information From the Target Language to Improve Crosslingual Text Classification
Using Information From the Target Language to Improve Crosslingual Text Classification Gabriela Ramírez 1, Manuel Montes 1, Luis Villaseñor 1, David Pinto 2 and Thamar Solorio 3 1 Laboratory of Language
More informationSelf-Advising SVM for Sleep Apnea Classification
Self-Advising SVM for Sleep Apnea Classification Yashar Maali 1, Adel Al-Jumaily 1 and Leon Laks 2 1 University of Technology, Sydney (UTS) Faculty of Engineering and IT, Sydney, Australia Yashar.Maali@student.uts.edu.au,
More informationHybrid HMM and HCRF model for sequence classification
Hybrid HMM and HCRF model for sequence classification Y. Soullard and T. Artières University Pierre and Marie Curie - LIP6 4 place Jussieu 75005 Paris - France Abstract. We propose a hybrid model combining
More informationGIANT: Geo-Informative Attributes for Location Recognition and Exploration
GIANT: Geo-Informative Attributes for Location Recognition and Exploration Quan Fang, Jitao Sang, Changsheng Xu Institute of Automation, Chinese Academy of Sciences October 23, 2013 Where is this? La Sagrada
More informationA New Approach for Detection and Classification of Diabetic Retinopathy Using PNN and SVM Classifiers
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 5, Ver. I (Sep.- Oct. 2017), PP 62-68 www.iosrjournals.org A New Approach for Detection and Classification
More informationComputer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California
Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface
More informationLinear and Nonlinear Optimization
Linear and Nonlinear Optimization SECOND EDITION Igor Griva Stephen G. Nash Ariela Sofer George Mason University Fairfax, Virginia Society for Industrial and Applied Mathematics Philadelphia Contents Preface
More informationDisease predictive, best drug: big data implementation of drug query with disease prediction, side effects & feedback analysis
Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 13, Number 6 (2017), pp. 2579-2587 Research India Publications http://www.ripublication.com Disease predictive, best drug: big data
More informationAn Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM)
www.ijcsi.org 8 An Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM) Shamsan Aljamali 1, Zhang Zuping 2 and Long Jun 3 1 School of Information
More informationCase Studies of Signed Networks
Case Studies of Signed Networks Christopher Wang December 10, 2014 Abstract Many studies on signed social networks focus on predicting the different relationships between users. However this prediction
More informationSpeeding up Greedy Forward Selection for Regularized Least-Squares
Speeding up Greedy Forward Selection for Regularized Least-Squares Tapio Pahikkala, Antti Airola, and Tapio Salakoski Department of Information Technology University of Turku and Turku Centre for Computer
More informationAn Automated Method for Neuronal Spike Source Identification
An Automated Method for Neuronal Spike Source Identification Roberto A. Santiago 1, James McNames 2, Kim Burchiel 3, George G. Lendaris 1 1 NW Computational Intelligence Laboratory, System Science, Portland
More informationIdentification of Tissue Independent Cancer Driver Genes
Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important
More informationClass discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines
Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Florian Markowetz and Anja von Heydebreck Max-Planck-Institute for Molecular Genetics Computational Molecular Biology
More informationIdentifying Parkinson s Patients: A Functional Gradient Boosting Approach
Identifying Parkinson s Patients: A Functional Gradient Boosting Approach Devendra Singh Dhami 1, Ameet Soni 2, David Page 3, and Sriraam Natarajan 1 1 Indiana University Bloomington 2 Swarthmore College
More informationSurvey on Data Mining Techniques for Diagnosis and Prognosis of Breast Cancer
Survey on Data Mining Techniques for Diagnosis and Prognosis of Breast Cancer Anupama Y.K 1, Amutha.S 2, Ramesh Babu.D.R 3 1 Faculty, 2 Prof., 3 Prof. 1 Anupama Y.K. Computer Science & anupamayk@gmail.com
More informationPlan Recognition through Goal Graph Analysis
Plan Recognition through Goal Graph Analysis Jun Hong 1 Abstract. We present a novel approach to plan recognition based on a two-stage paradigm of graph construction and analysis. First, a graph structure
More informationSentiment Classification of Chinese Reviews in Different Domain: A Comparative Study
Sentiment Classification of Chinese Reviews in Different Domain: A Comparative Study Qingqing Zhou and Chengzhi Zhang ( ) Department of Information Management, Nanjing University of Science and Technology,
More informationMammogram Analysis: Tumor Classification
Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the
More informationVital Responder: Real-time Health Monitoring of First- Responders
Vital Responder: Real-time Health Monitoring of First- Responders Ye Can 1,2 Advisors: Miguel Tavares Coimbra 2, Vijayakumar Bhagavatula 1 1 Department of Electrical & Computer Engineering, Carnegie Mellon
More informationAn empirical evaluation of text classification and feature selection methods
ORIGINAL RESEARCH An empirical evaluation of text classification and feature selection methods Muazzam Ahmed Siddiqui Department of Information Systems, Faculty of Computing and Information Technology,
More informationPlan Recognition through Goal Graph Analysis
Plan Recognition through Goal Graph Analysis Jun Hong 1 Abstract. We present a novel approach to plan recognition based on a two-stage paradigm of graph construction and analysis. First, a graph structure
More informationBayesian (Belief) Network Models,
Bayesian (Belief) Network Models, 2/10/03 & 2/12/03 Outline of This Lecture 1. Overview of the model 2. Bayes Probability and Rules of Inference Conditional Probabilities Priors and posteriors Joint distributions
More informationInformation Sciences 00 (2013) Lou i Al-Shrouf, Mahmud-Sami Saadawia, Dirk Söffker
Information Sciences 00 (2013) 1 29 Information Sciences Improved process monitoring and supervision based on a reliable multi-stage feature-based pattern recognition technique Lou i Al-Shrouf, Mahmud-Sami
More informationA Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China
A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some
More informationIntroduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018
Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this
More informationApplying One-vs-One and One-vs-All Classifiers in k-nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem
Oral Presentation at MIE 2011 30th August 2011 Oslo Applying One-vs-One and One-vs-All Classifiers in k-nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem Kirsi
More informationChapter 1. Introduction
Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a
More informationClassification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang
Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression
More informationCS229 Final Project Report. Predicting Epitopes for MHC Molecules
CS229 Final Project Report Predicting Epitopes for MHC Molecules Xueheng Zhao, Shanshan Tuo Biomedical informatics program Stanford University Abstract Major Histocompatibility Complex (MHC) plays a key
More information38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16
38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and
More informationBREAST CANCER DETECTION BASED ON DIFFEREN- TIAL ULTRAWIDEBAND MICROWAVE RADAR
Progress In Electromagnetics Research M, Vol. 20, 231 242, 2011 BREAST CANCER DETECTION BASED ON DIFFEREN- TIAL ULTRAWIDEBAND MICROWAVE RADAR D. Byrne *, M. O Halloran, M. Glavin, and E. Jones College
More informationText mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor
Text mining for lung cancer cases over large patient admission data David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor Opportunities for Biomedical Informatics Increasing roll-out
More informationDiagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods
International Journal of Bioinformatics and Biomedical Engineering Vol. 1, No. 3, 2015, pp. 318-322 http://www.aiscience.org/journal/ijbbe ISSN: 2381-7399 (Print); ISSN: 2381-7402 (Online) Diagnosis of
More informationMachine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017
Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 A.K.A. Artificial Intelligence Unsupervised learning! Cluster analysis Patterns, Clumps, and Joining
More informationarxiv: v1 [cs.lg] 3 Jan 2018
arxiv:1801.01204v1 [cs.lg] 3 Jan 2018 Predicting Chronic Disease Hospitalizations from Electronic Health Records: An Interpretable Classification Approach Theodora S. Brisimi, Tingting Xu, Taiyao Wang,
More informationKnowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains
Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains Dr. Sascha Losko, Dr. Karsten Wenger, Dr. Wenzel Kalus, Dr. Andrea Ramge, Dr. Jens Wiehler,
More informationAutomated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection
Y.-H. Wen, A. Bainbridge-Smith, A. B. Morris, Automated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection, Proceedings of Image and Vision Computing New Zealand 2007, pp. 132
More informationAn Experimental Study of Diabetes Disease Prediction System Using Classification Techniques
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 1, Ver. IV (Jan.-Feb. 2017), PP 39-44 www.iosrjournals.org An Experimental Study of Diabetes Disease
More informationThe Development and Application of Bayesian Networks Used in Data Mining Under Big Data
2017 International Conference on Arts and Design, Education and Social Sciences (ADESS 2017) ISBN: 978-1-60595-511-7 The Development and Application of Bayesian Networks Used in Data Mining Under Big Data
More informationHeterogeneous Data Mining for Brain Disorder Identification. Bokai Cao 04/07/2015
Heterogeneous Data Mining for Brain Disorder Identification Bokai Cao 04/07/2015 Outline Introduction Tensor Imaging Analysis Brain Network Analysis Davidson et al. Network discovery via constrained tensor
More informationHYBRID SUPPORT VECTOR MACHINE BASED MARKOV CLUSTERING FOR TUMOR DETECTION FROM BIO-MOLECULAR DATA
HYBRID SUPPORT VECTOR MACHINE BASED MARKOV CLUSTERING FOR TUMOR DETECTION FROM BIO-MOLECULAR DATA S. SubashChandraBose 1 and T. Christopher 2 1 Department of Computer Science, PG and Research Department,
More informationPerformance of SVM Classifiers in Predicting Diabetes
Performance of SVM Classifiers in Predicting Diabetes Rahul Samant, SVKM S NMIMS, Shirpur Campus, India; Srikantha Rao, TIMSCDR, Mumbai University, Kandivali, Mumbai, India, Abstract This paper investigates
More informationEvent Classification and Relationship Labeling in Affiliation Networks
Event Classification and Relationship Labeling in Affiliation Networks Abstract Many domains are best described as an affiliation network in which there are entities such as actors, events and organizations
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More informationEECS 433 Statistical Pattern Recognition
EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern
More informationCSE 258 Lecture 1.5. Web Mining and Recommender Systems. Supervised learning Regression
CSE 258 Lecture 1.5 Web Mining and Recommender Systems Supervised learning Regression What is supervised learning? Supervised learning is the process of trying to infer from labeled data the underlying
More informationDISCOVERING IMPLICIT ASSOCIATIONS BETWEEN GENES AND HEREDITARY DISEASES
DISCOVERING IMPLICIT ASSOCIATIONS BETWEEN GENES AND HEREDITARY DISEASES KAZUHIRO SEKI Graduate School of Science and Technology, Kobe University 1-1 Rokkodai, Nada, Kobe 657-8501, Japan E-mail: seki@cs.kobe-u.ac.jp
More informationEvaluation of Gene Selection Using Support Vector Machine Recursive Feature Elimination
Evaluation of Gene Selection Using Support Vector Machine Recursive Feature Elimination Committee: Advisor: Dr. Rosemary Renaut Dr. Adrienne C. Scheck Dr. Kenneth Hoober Dr. Bradford Kirkman-Liff John
More informationLearning Convolutional Neural Networks for Graphs
GA-65449 Learning Convolutional Neural Networks for Graphs Mathias Niepert Mohamed Ahmed Konstantin Kutzkov NEC Laboratories Europe Representation Learning for Graphs Telecom Safety Transportation Industry
More information