Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics

Size: px
Start display at page:

Download "Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics"

Transcription

1 Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics Karsten Borgwardt March 1 to March 12, 2010 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1

2 Why compare sequences? Protein sequences Proteins are chains of amino acids. 20 different types of amino acids can be found in protein sequences. Protein sequence changes over time by mutations, deletion, insertions. Different protein sequences may diverge from one common ancestor. Their sequences may differ slightly, yet their function is often conserved. Karsten Borgwardt: Data Mining in Bioinformatics, Page 2

3 Why compare sequences? Biological Question: Biologists are interested in the reverse direction: Given two protein sequences, is it likely that they originate from the same common ancestor? Computational Challenge: How to measure similarity between two protein sequence, or equivalently: How to measure similarity between two strings Kernel Challenge: How to measure similarity between two strings via a kernel function In short: How to define a string kernel Karsten Borgwardt: Data Mining in Bioinformatics, Page 3

4 History of sequence comparison First phase Smith-Waterman BLAST Second phase Profiles Hidden Markov Models Third phase PSI-Blast SAM-T98 Fourth phase Kernels Karsten Borgwardt: Data Mining in Bioinformatics, Page 4

5 Sequence comparison: Phase 1 Idea Methods Measure pairwise similarities between sequences with gaps Smith-Waterman dynamic programming high accuracy slow (O(n 2 )) BLAST faster heuristic alternative with sufficient accuracy searches common substrings of fixed length extends these in both directions performs gapped alignment Karsten Borgwardt: Data Mining in Bioinformatics, Page 5

6 Sequence comparison: Phase 2 Idea Methods Collect aggregate statistics from a family of sequences Compare this statistics to a single unlabeled protein Hidden Markov Models (HMMs) Markov process with hidden and observable parameters Forward algorithm determines probability if given sequence is output of particular HMM Profiles Profiles of sequence families are derived by multiple sequence alignment Given sequence is compared to this profile Karsten Borgwardt: Data Mining in Bioinformatics, Page 6

7 Sequence comparison: Phase 3 Idea Methods Create single models from database collections of homologous sequences PSI-BLAST Position specific iterative BLAST Profile from highest scoring hits in initial BLAST runs Position weighting according to degree of conservation Iteration of these steps SAM-T98, now SAM-T02 database search with HMM from multiple sequence alignment Karsten Borgwardt: Data Mining in Bioinformatics, Page 7

8 Phase 4: Kernels and SVMs General idea Model differences between classes of sequences Use SVM classifier to distinguish classes Use kernel to measure similarity between strings Kernels for Protein Sequences SVM-Fisher kernel Composite kernel Motif kernel String kernel Karsten Borgwardt: Data Mining in Bioinformatics, Page 8

9 SVM-Fisher method General idea Combine HMMs and SVMs for sequence classification Won best-paper award at ISMB 1999 Sequence representation fixed-length vector components are transition and emission probabilities transformation into Fisher score Karsten Borgwardt: Data Mining in Bioinformatics, Page 9

10 SVM-Fisher method Algorithm Model protein family F as HMM Transform query protein X into fixed-length vector via HMM Compute kernel between X and positive and negative examples of the protein family Advantages allows to incorporate prior knowledge allows to deal with missing data is interpretable outperforms competing methods Karsten Borgwardt: Data Mining in Bioinformatics, Page 10

11 Composition kernels General idea Model sequence by amino acid content Bin amino acids w.r.t physico-chemical properties Sequence representation feature vector of amino acid frequencies physico-chemical properties include predicted secondary structure, hydrophobicity, normalized van der Waals volume, polarity, polarizability useful database: AAindex Karsten Borgwardt: Data Mining in Bioinformatics, Page 11

12 Motif kernels General idea Conserved motif in amino acid sequences indicate structural and functional relationship Model sequence s as a feature vector f representing motifs i-th component of f is 1 s contains i-th motif Motif databases PROSITE emotifs BLOCKS+ combines several databases Generated by manual construction multiple sequence alignment Karsten Borgwardt: Data Mining in Bioinformatics, Page 12

13 Pairwise comparison kernels General idea Employ empirical kernel map on Smith-Waterman/Blast scores Advantage Utilizes decades of practical experience with Blast Disadvantage High computational cost (O(m 3 )) Alleviation Employ Blast instead of Smith-Waterman Use vectorization set for empirical map only Karsten Borgwardt: Data Mining in Bioinformatics, Page 13

14 Phase 4: String Kernels General idea Count common substrings in two strings A substring of length k is a k-mer Variations Assign weights to k-mers Allow for mismatches Allow for gaps Include substitutions Include wildcards Karsten Borgwardt: Data Mining in Bioinformatics, Page 14

15 Spectrum Kernel General idea For each l-mer α Σ l, the coordinate indexed by α will be the number of times α occurs in sequence x. Then the l-spectrum feature map is Φ Spectrum l (x) = (φ α (x)) α Σ l Here φ α (x) is the # occurrences of α in x. The spectrum kernel is now the inner product in the feature space defined by this map: k Spectrum (x, x ) =< Φ Spectrum l (x), Φ Spectrum l (x ) > Sequences are deemed the more similar, the more common substrings they contain Karsten Borgwardt: Data Mining in Bioinformatics, Page 15

16 Spectrum Kernel Principle Spectrum kernel: Count exactly common k-mers Karsten Borgwardt: Data Mining in Bioinformatics, Page 16

17 Mismatch Kernel General idea Do not enforce strictly exact matches Define mismatch neighborhood of an l-mer α with up to m mismatches: φ Mismatch (l,m) (α) = (φ β (α)) β Σ l For a sequence x of any length, the map is then extended as φ Mismatch (l,m) (x) = (φ Mismatch (l,m) (α)) l mers α in x The mismatch kernel is now the inner product in feature space defined by: k Mismatch (l,m) (x, x ) =< Φ Mismatch (l,m) (x), Φ Mismatch (l,m) (x ) > Karsten Borgwardt: Data Mining in Bioinformatics, Page 17

18 Mismatch Kernel Principle Mismatch kernel: Count common k-mers with max. m mismatches Karsten Borgwardt: Data Mining in Bioinformatics, Page 18

19 Gappy Kernel General idea Allow for gaps in common substrings subsequences A g-mer then contributes to all its l-mer subsequences φ Gap (g,l) (α) = (φ β(α)) β Σ l For a sequence x of any length, the map is then extended as φ Gap (g,l) (x) = (φ Gap (g,l) (α)) g mers α in x The gappy kernel is now the inner product in feature space defined by: k Gap (g,l) (x, x ) =< Φ Gap (g,l) (x), ΦGap (g,l) (x ) > Karsten Borgwardt: Data Mining in Bioinformatics, Page 19

20 Gappy Kernel Principle Gappy kernel: Count common l-subsequences of g- mers Karsten Borgwardt: Data Mining in Bioinformatics, Page 20

21 Substitution Kernel General idea mismatch neighborhood substitution neighborhood An l-mer then contributes to all l-mers in its substitution neighborhood l M (l,σ) (α) = {β = b 1 b 2... b l Σ l : log P (a i b i ) < σ} For a sequence x of any length, the map is then extended as φ Sub (l,σ) (x) = (φ Sub (l,σ) (α)) l mers α in x The substitution kernel is now: k(l,σ) Sub (x, x ) =< Φ Sub (l,σ)(x), ΦSub (l,σ) (x ) > i Karsten Borgwardt: Data Mining in Bioinformatics, Page 21

22 Substitution Kernel Principle Substitution kernel: Count common l-subsequences in substitution neighborhood Karsten Borgwardt: Data Mining in Bioinformatics, Page 22

23 Wildcard Kernels General idea augment alphabet Σ by a wildcard character Σ { } given α from Σ l and β from {Σ { }} l with maximum m occurrences of l-mer α contributes to l-mer β if their non-wildcard characters match For a sequence x of any length, the map is then given by φ W (l,m,λ) ildcard (x) = (φ β (α)) β W l mers α in x where φ β (α) = λ j if α matches pattern β containing j wildcards, φ β (α) = 0 if α does not match β, and 0 λ 1. Karsten Borgwardt: Data Mining in Bioinformatics, Page 23

24 Wildcard Kernel Principle Wildcard kernel: Count l-mers that match except for wildcards Karsten Borgwardt: Data Mining in Bioinformatics, Page 24

25 References and further reading References [1] C. Leslie, E. Eskin, and W. S. Noble. The spectrum kernel: A string kernel for SVM protein classification. In PSB, pages , [2] C. Leslie, E. Eskin, J. Weston, and W. S. Noble. Mismatch string kernels for SVM protein classification. In NIPS MIT Press. [3] C. Leslie and R. Kuang. Fast kernels for inexact string matching. In COLT, [4] B. Schölkopf, K. Tsuda, and J.-P. Vert. Kernel Methods in Computational Biology, Chapter 3 and 4. MIT Press, Cambridge, MA, Karsten Borgwardt: Data Mining in Bioinformatics, Page 25

Data Mining in Bioinformatics Day 4: Text Mining

Data Mining in Bioinformatics Day 4: Text Mining Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1 What is text mining?

More information

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 21 to March 4, 2011 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt:

More information

CS229 Final Project Report. Predicting Epitopes for MHC Molecules

CS229 Final Project Report. Predicting Epitopes for MHC Molecules CS229 Final Project Report Predicting Epitopes for MHC Molecules Xueheng Zhao, Shanshan Tuo Biomedical informatics program Stanford University Abstract Major Histocompatibility Complex (MHC) plays a key

More information

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Ryo Izawa, Naoki Motohashi, and Tomohiro Takagi Department of Computer Science Meiji University 1-1-1 Higashimita,

More information

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene

More information

Prediction of Alternative Splice Sites in Human Genes

Prediction of Alternative Splice Sites in Human Genes San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2007 Prediction of Alternative Splice Sites in Human Genes Douglas Simmons San Jose State University

More information

Bioinformatic analyses: methodology for allergen similarity search. Zoltán Divéki, Ana Gomes EFSA GMO Unit

Bioinformatic analyses: methodology for allergen similarity search. Zoltán Divéki, Ana Gomes EFSA GMO Unit Bioinformatic analyses: methodology for allergen similarity search Zoltán Divéki, Ana Gomes EFSA GMO Unit EFSA info session on applications - GMO Parma, Italy 28 October 2014 BIOINFORMATIC ANALYSES Analysis

More information

Kernel Methods and String Kernels for Authorship Analysis

Kernel Methods and String Kernels for Authorship Analysis Kernel Methods and String Kernels for Authorship Analysis Notebook for PAN at CLEF 2012 Marius Popescu 1 and Cristian Grozea 2 1 University of Bucharest, Romania 2 Fraunhofer FOKUS, Berlin, Germany popescunmarius@gmail.com

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Bas. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 18 th 2016 Protein alignments We have seen how to create a pairwise alignment of two sequences

More information

Influenza Virus HA Subtype Numbering Conversion Tool and the Identification of Candidate Cross-Reactive Immune Epitopes

Influenza Virus HA Subtype Numbering Conversion Tool and the Identification of Candidate Cross-Reactive Immune Epitopes Influenza Virus HA Subtype Numbering Conversion Tool and the Identification of Candidate Cross-Reactive Immune Epitopes Brian J. Reardon, Ph.D. J. Craig Venter Institute breardon@jcvi.org Introduction:

More information

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network Contents Classification Rule Generation for Bioinformatics Hyeoncheol Kim Rule Extraction from Neural Networks Algorithm Ex] Promoter Domain Hybrid Model of Knowledge and Learning Knowledge refinement

More information

Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Sequence Analysis: Part III. Pattern Searching and Gene Finding Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18

More information

LIPOPREDICT: Bacterial lipoprotein prediction server

LIPOPREDICT: Bacterial lipoprotein prediction server www.bioinformation.net Server Volume 8(8) LIPOPREDICT: Bacterial lipoprotein prediction server S Ramya Kumari, Kiran Kadam, Ritesh Badwaik & Valadi K Jayaraman* Centre for Development of Advanced Computing

More information

A HMM-based Pre-training Approach for Sequential Data

A HMM-based Pre-training Approach for Sequential Data A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University

More information

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks 1 Rumor Detection on Twitter with Tree-structured Recursive Neural Networks Jing Ma 1, Wei Gao 2, Kam-Fai Wong 1,3 1 The Chinese University of Hong Kong 2 Victoria University of Wellington, New Zealand

More information

A Universal Trend among Proteomes Indicates an Oily Last Common Ancestor. BI Journal Club Aleksander Sudakov

A Universal Trend among Proteomes Indicates an Oily Last Common Ancestor. BI Journal Club Aleksander Sudakov A Universal Trend among Proteomes Indicates an Oily Last Common Ancestor BI Journal Club 11.03.13 Aleksander Sudakov Used literature Ranjan V. Mannige, Charles L. Brooks, and Eugene I. Shakhnovich. 2012.

More information

Colorspace & Matching

Colorspace & Matching Colorspace & Matching Outline Color space and 2-base-encoding Quality Values and filtering Mapping algorithm and considerations Estimate accuracy Coverage 2 2008 Applied Biosystems Color Space Properties

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some

More information

Understanding eye movements in face recognition with hidden Markov model

Understanding eye movements in face recognition with hidden Markov model Understanding eye movements in face recognition with hidden Markov model 1 Department of Psychology, The University of Hong Kong, Pokfulam Road, Hong Kong 2 Department of Computer Science, City University

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

Predicting Protein-Peptide Binding Affinity by Learning Peptide-Peptide Distance Functions

Predicting Protein-Peptide Binding Affinity by Learning Peptide-Peptide Distance Functions Predicting Protein-Peptide Binding Affinity by Learning Peptide-Peptide Distance Functions Chen Yanover and Tomer Hertz,2 School of Computer Science and Engineering 2 The Center for Neural Computation,

More information

Statistical analysis of RIM data (retroviral insertional mutagenesis) Bioinformatics and Statistics The Netherlands Cancer Institute Amsterdam

Statistical analysis of RIM data (retroviral insertional mutagenesis) Bioinformatics and Statistics The Netherlands Cancer Institute Amsterdam Statistical analysis of RIM data (retroviral insertional mutagenesis) Lodewyk Wessels Bioinformatics and Statistics The Netherlands Cancer Institute Amsterdam Viral integration Viral integration Viral

More information

Hamby, Stephen Edward (2010) Data mining techniques for protein sequence analysis. PhD thesis, University of Nottingham.

Hamby, Stephen Edward (2010) Data mining techniques for protein sequence analysis. PhD thesis, University of Nottingham. Hamby, Stephen Edward (2010) Data mining techniques for protein sequence analysis. PhD thesis, University of Nottingham. Access from the University of Nottingham repository: http://eprints.nottingham.ac.uk/11498/1/sehthesis_corrected_003.pdf

More information

BCB 444/544 Fall 07 Dobbs 1

BCB 444/544 Fall 07 Dobbs 1 BCB 444/544 Lecture 19 A bit of: Protein Structure - Basics Protein Structure Visualization, & Comparison #19_Oct5 Required Reading (before lecture) Mon Oct 1 - Lecture 17 Protein Motifs & Domain Prediction

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the

More information

Predicting Breast Cancer Recurrence Using Machine Learning Techniques

Predicting Breast Cancer Recurrence Using Machine Learning Techniques Predicting Breast Cancer Recurrence Using Machine Learning Techniques Umesh D R Department of Computer Science & Engineering PESCE, Mandya, Karnataka, India Dr. B Ramachandra Department of Electrical and

More information

Gene Finding in Eukaryotes

Gene Finding in Eukaryotes Gene Finding in Eukaryotes Jan-Jaap Wesselink jjwesselink@cnio.es Computational and Structural Biology Group, Centro Nacional de Investigaciones Oncológicas Madrid, April 2008 Jan-Jaap Wesselink jjwesselink@cnio.es

More information

CS 4365: Artificial Intelligence Recap. Vibhav Gogate

CS 4365: Artificial Intelligence Recap. Vibhav Gogate CS 4365: Artificial Intelligence Recap Vibhav Gogate Exam Topics Search BFS, DFS, UCS, A* (tree and graph) Completeness and Optimality Heuristics: admissibility and consistency CSPs Constraint graphs,

More information

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Manuela Zucknick Division of Biostatistics, German Cancer Research Center Biometry Workshop,

More information

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s Using Bayesian Networks to Analyze Expression Data Xu Siwei, s0789023 Muhammad Ali Faisal, s0677834 Tejal Joshi, s0677858 Outline Introduction Bayesian Networks Equivalence Classes Applying to Expression

More information

Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise!

Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise! Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise! 1. What process brought 2 divergent chlorophylls into the ancestor of the cyanobacteria,

More information

Exploring Potential Discriminatory Information Embedded in PSSM to Enhance Protein Structural Class Prediction Accuracy

Exploring Potential Discriminatory Information Embedded in PSSM to Enhance Protein Structural Class Prediction Accuracy Exploring Potential Discriminatory Information Embedded in PSSM to Enhance Protein Structural Class Prediction Accuracy Abdollah Dehzangi 1,2, Kuldip Paliwal 1, James Lyons 1, Alok Sharma 3, and Abdul

More information

cloglog link function to transform the (population) hazard probability into a continuous

cloglog link function to transform the (population) hazard probability into a continuous Supplementary material. Discrete time event history analysis Hazard model details. In our discrete time event history analysis, we used the asymmetric cloglog link function to transform the (population)

More information

Speeding up Greedy Forward Selection for Regularized Least-Squares

Speeding up Greedy Forward Selection for Regularized Least-Squares Speeding up Greedy Forward Selection for Regularized Least-Squares Tapio Pahikkala, Antti Airola, and Tapio Salakoski Department of Information Technology University of Turku and Turku Centre for Computer

More information

Effective Diagnosis of Alzheimer s Disease by means of Association Rules

Effective Diagnosis of Alzheimer s Disease by means of Association Rules Effective Diagnosis of Alzheimer s Disease by means of Association Rules Rosa Chaves (rosach@ugr.es) J. Ramírez, J.M. Górriz, M. López, D. Salas-Gonzalez, I. Illán, F. Segovia, P. Padilla Dpt. Theory of

More information

Multi-atlas-based segmentation of the parotid glands of MR images in patients following head-and-neck cancer radiotherapy

Multi-atlas-based segmentation of the parotid glands of MR images in patients following head-and-neck cancer radiotherapy Multi-atlas-based segmentation of the parotid glands of MR images in patients following head-and-neck cancer radiotherapy Guanghui Cheng, Jilin University Xiaofeng Yang, Emory University Ning Wu, Jilin

More information

Sign Language Recognition System Using SIFT Based Approach

Sign Language Recognition System Using SIFT Based Approach Sign Language Recognition System Using SIFT Based Approach Ashwin S. Pol, S. L. Nalbalwar & N. S. Jadhav Dept. of E&TC, Dr. BATU Lonere, MH, India E-mail : ashwin.pol9@gmail.com, nalbalwar_sanjayan@yahoo.com,

More information

Hybrid HMM and HCRF model for sequence classification

Hybrid HMM and HCRF model for sequence classification Hybrid HMM and HCRF model for sequence classification Y. Soullard and T. Artières University Pierre and Marie Curie - LIP6 4 place Jussieu 75005 Paris - France Abstract. We propose a hybrid model combining

More information

PROCEEDINGS OF SPIE. Models of temporal enhanced ultrasound data for prostate cancer diagnosis: the impact of time-series order

PROCEEDINGS OF SPIE. Models of temporal enhanced ultrasound data for prostate cancer diagnosis: the impact of time-series order PROCEEDINGS OF SPIE SPIEDigitalLibrary.org/conference-proceedings-of-spie Models of temporal enhanced ultrasound data for prostate cancer diagnosis: the impact of time-series order Layan Nahlawi Caroline

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Literature Survey Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is

More information

J2.6 Imputation of missing data with nonlinear relationships

J2.6 Imputation of missing data with nonlinear relationships Sixth Conference on Artificial Intelligence Applications to Environmental Science 88th AMS Annual Meeting, New Orleans, LA 20-24 January 2008 J2.6 Imputation of missing with nonlinear relationships Michael

More information

Study the Evolution of the Avian Influenza Virus

Study the Evolution of the Avian Influenza Virus Designing an Algorithm to Study the Evolution of the Avian Influenza Virus Arti Khana Mentor: Takis Benos Rachel Brower-Sinning Department of Computational Biology University of Pittsburgh Overview Introduction

More information

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Riccardo Miotto and Chunhua Weng Department of Biomedical Informatics Columbia University,

More information

Learning Convolutional Neural Networks for Graphs

Learning Convolutional Neural Networks for Graphs GA-65449 Learning Convolutional Neural Networks for Graphs Mathias Niepert Mohamed Ahmed Konstantin Kutzkov NEC Laboratories Europe Representation Learning for Graphs Telecom Safety Transportation Industry

More information

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network UTM Computing Proceedings Innovations in Computing Technology and Applications Volume 2 Year: 2017 ISBN: 978-967-0194-95-0 1 Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon

More information

For all of the following, you will have to use this website to determine the answers:

For all of the following, you will have to use this website to determine the answers: For all of the following, you will have to use this website to determine the answers: http://blast.ncbi.nlm.nih.gov/blast.cgi We are going to be using the programs under this heading: Answer the following

More information

TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING

TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING 134 TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING H.F.S.M.Fonseka 1, J.T.Jonathan 2, P.Sabeshan 3 and M.B.Dissanayaka 4 1 Department of Electrical And Electronic Engineering, Faculty

More information

VIP: an integrated pipeline for metagenomics of virus

VIP: an integrated pipeline for metagenomics of virus VIP: an integrated pipeline for metagenomics of virus identification and discovery Yang Li 1, Hao Wang 2, Kai Nie 1, Chen Zhang 1, Yi Zhang 1, Ji Wang 1, Peihua Niu 1 and Xuejun Ma 1 * 1. Key Laboratory

More information

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Ramon Maldonado, BS, Travis Goodwin, PhD Sanda M. Harabagiu, PhD The University

More information

Mutation Profile to Predict Tumor Stage in Lung Adenocarcinoma

Mutation Profile to Predict Tumor Stage in Lung Adenocarcinoma Mutation Profile to Predict Tumor Stage in Lung Adenocarcinoma 1 st Calvin Kuo Mechanical Engineering Department Stanford University Stanford, USA calvink@stanford.edu Abstract Lung adenocarcinoma is among

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training. Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze

More information

Real Time Sign Language Processing System

Real Time Sign Language Processing System Real Time Sign Language Processing System Dibyabiva Seth (&), Anindita Ghosh, Ariruna Dasgupta, and Asoke Nath Department of Computer Science, St. Xavier s College (Autonomous), Kolkata, India meetdseth@gmail.com,

More information

Introduction to Computational Neuroscience

Introduction to Computational Neuroscience Introduction to Computational Neuroscience Lecture 5: Data analysis II Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis II 6 Single

More information

Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks

Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks Caio R. Souza 1, Flavio F. Nobre 1, Priscila V.M. Lima 2, Robson M. Silva 2, Rodrigo M. Brindeiro 3, Felipe

More information

CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY

CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY 64 CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY 3.1 PROBLEM DEFINITION Clinical data mining (CDM) is a rising field of research that aims at the utilization of data mining techniques to extract

More information

Machine Learning for Personalized Medicine

Machine Learning for Personalized Medicine Department Biosystems Machine Learning for Personalized Medicine Karsten Borgwardt ETH Zürich Fraunhofer-Institut Kaiserslautern, September 30, 2016 The Need for Machine Learning in Computational Biology

More information

Contents. 2 Statistics Static reference method Sampling reference set Statistics Sampling Types...

Contents. 2 Statistics Static reference method Sampling reference set Statistics Sampling Types... Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium http://www.computationalproteomics.com icelogo manual Niklaas Colaert

More information

Algorithms in Nature. Pruning in neural networks

Algorithms in Nature. Pruning in neural networks Algorithms in Nature Pruning in neural networks Neural network development 1. Efficient signal propagation [e.g. information processing & integration] 2. Robust to noise and failures [e.g. cell or synapse

More information

Ras and Cell Signaling Exercise

Ras and Cell Signaling Exercise Ras and Cell Signaling Exercise Learning Objectives In this exercise, you will use, a protein 3D- viewer, to explore: the structure of the Ras protein the active and inactive state of Ras and the amino

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

Bioinformatics Laboratory Exercise

Bioinformatics Laboratory Exercise Bioinformatics Laboratory Exercise Biology is in the midst of the genomics revolution, the application of robotic technology to generate huge amounts of molecular biology data. Genomics has led to an explosion

More information

TURKISH SIGN LANGUAGE RECOGNITION USING HIDDEN MARKOV MODEL

TURKISH SIGN LANGUAGE RECOGNITION USING HIDDEN MARKOV MODEL TURKISH SIGN LANGUAGE RECOGNITION USING HIDDEN MARKOV MODEL Kakajan Kakayev 1 and Ph.D. Songül Albayrak 2 1,2 Department of Computer Engineering, Yildiz Technical University, Istanbul, Turkey kkakajan@gmail.com

More information

CONSTRUCTION OF PHYLOGENETIC TREE USING NEIGHBOR JOINING ALGORITHMS TO IDENTIFY THE HOST AND THE SPREADING OF SARS EPIDEMIC

CONSTRUCTION OF PHYLOGENETIC TREE USING NEIGHBOR JOINING ALGORITHMS TO IDENTIFY THE HOST AND THE SPREADING OF SARS EPIDEMIC CONSTRUCTION OF PHYLOGENETIC TREE USING NEIGHBOR JOINING ALGORITHMS TO IDENTIFY THE HOST AND THE SPREADING OF SARS EPIDEMIC 1 MOHAMMAD ISA IRAWAN, 2 SITI AMIROCH 1 Institut Teknologi Sepuluh Nopember (ITS)

More information

Correlogram Method for Comparing Bio-Sequences

Correlogram Method for Comparing Bio-Sequences Correlogram Method for Comparing Bio-Sequences Gandhali P. Samant, and Debasis Mitra dmitra@cs.fit.edu Technical Report FIT-CS-2006-01 Content of a Master s Thesis Submitted to Florida Institute of Technology

More information

Cost-sensitive Dynamic Feature Selection

Cost-sensitive Dynamic Feature Selection Cost-sensitive Dynamic Feature Selection He He 1, Hal Daumé III 1 and Jason Eisner 2 1 University of Maryland, College Park 2 Johns Hopkins University June 30, 2012 He He, Hal Daumé III and Jason Eisner

More information

Predicting Sleep Using Consumer Wearable Sensing Devices

Predicting Sleep Using Consumer Wearable Sensing Devices Predicting Sleep Using Consumer Wearable Sensing Devices Miguel A. Garcia Department of Computer Science Stanford University Palo Alto, California miguel16@stanford.edu 1 Introduction In contrast to the

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis Data mining for Obstructive Sleep Apnea Detection 18 October 2017 Konstantinos Nikolaidis Introduction: What is Obstructive Sleep Apnea? Obstructive Sleep Apnea (OSA) is a relatively common sleep disorder

More information

Yeast Cells Classification Machine Learning Approach to Discriminate Saccharomyces cerevisiae Yeast Cells Using Sophisticated Image Features.

Yeast Cells Classification Machine Learning Approach to Discriminate Saccharomyces cerevisiae Yeast Cells Using Sophisticated Image Features. Yeast Cells Classification Machine Learning Approach to Discriminate Saccharomyces cerevisiae Yeast Cells Using Sophisticated Image Features. Mohamed Tleis Supervisor: Fons J. Verbeek Leiden University

More information

arxiv: v2 [q-bio.pe] 21 Jan 2008

arxiv: v2 [q-bio.pe] 21 Jan 2008 Viral population estimation using pyrosequencing Nicholas Eriksson 1,, Lior Pachter 2, Yumi Mitsuya 3, Soo-Yon Rhee 3, Chunlin Wang 3, Baback Gharizadeh 4, Mostafa Ronaghi 4, Robert W. Shafer 3, and Niko

More information

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1 1. INTRODUCTION Sign language interpretation is one of the HCI applications where hand gesture plays important role for communication. This chapter discusses sign language interpretation system with present

More information

Unsupervised Identification of Isotope-Labeled Peptides

Unsupervised Identification of Isotope-Labeled Peptides Unsupervised Identification of Isotope-Labeled Peptides Joshua E Goldford 13 and Igor GL Libourel 124 1 Biotechnology institute, University of Minnesota, Saint Paul, MN 55108 2 Department of Plant Biology,

More information

Persons Personality Traits Recognition using Machine Learning Algorithms and Image Processing Techniques

Persons Personality Traits Recognition using Machine Learning Algorithms and Image Processing Techniques Persons Personality Traits Recognition using Machine Learning Algorithms and Image Processing Techniques Kalani Ilmini 1 and TGI Fernando 2 1 Department of Computer Science, University of Sri Jayewardenepura,

More information

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface

More information

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Thomas E. Rothenfluh 1, Karl Bögl 2, and Klaus-Peter Adlassnig 2 1 Department of Psychology University of Zurich, Zürichbergstraße

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017 RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science

More information

Project PRACE 1IP, WP7.4

Project PRACE 1IP, WP7.4 Project PRACE 1IP, WP7.4 Plamenka Borovska, Veska Gancheva Computer Systems Department Technical University of Sofia The Team is consists of 5 members: 2 Professors; 1 Assist. Professor; 2 Researchers;

More information

Identification of single de novo drug candidate for dengue and filaria on Aedes aegypti and Culex quinquefasciatus mosquitoes using insilico Protocols

Identification of single de novo drug candidate for dengue and filaria on Aedes aegypti and Culex quinquefasciatus mosquitoes using insilico Protocols Available online at www.ijntps.org ISSN: 2277 2782 INTERNATIONAL JOURNAL OF NOVEL TRENDS IN PHARMACEUTICAL SCIENCES RESEARCH ARTICLE Identification of single de novo drug candidate for dengue and filaria

More information

Sign Language to Number by Neural Network

Sign Language to Number by Neural Network Sign Language to Number by Neural Network Shekhar Singh Assistant Professor CSE, Department PIET, samalkha, Panipat, India Pradeep Bharti Assistant Professor CSE, Department PIET, samalkha, Panipat, India

More information

A micropower support vector machine based seizure detection architecture for embedded medical devices

A micropower support vector machine based seizure detection architecture for embedded medical devices A micropower support vector machine based seizure detection architecture for embedded medical devices The MIT Faculty has made this article openly available. Please share how this access benefits you.

More information

Changes to Biochemistry (4th ed.), 2nd Printing

Changes to Biochemistry (4th ed.), 2nd Printing Changes to Biochemistry (4th ed.), 2nd Printing 1. p. vi, line 7. Change: hydrophobic to hydrophilic. Make the same change on the back cover. 2. p. xx, at the end of the Chap. 17 portion. Add the line:

More information

CISC453 Winter Probabilistic Reasoning Part B: AIMA3e Ch

CISC453 Winter Probabilistic Reasoning Part B: AIMA3e Ch CISC453 Winter 2010 Probabilistic Reasoning Part B: AIMA3e Ch 14.5-14.8 Overview 2 a roundup of approaches from AIMA3e 14.5-14.8 14.5 a survey of approximate methods alternatives to the direct computing

More information

Machine Learning Applied to Perception: Decision-Images for Gender Classification

Machine Learning Applied to Perception: Decision-Images for Gender Classification Machine Learning Applied to Perception: Decision-Images for Gender Classification Felix A. Wichmann and Arnulf B. A. Graf Max Planck Institute for Biological Cybernetics Tübingen, Germany felix.wichmann@tuebingen.mpg.de

More information

Module 3. Genomic data and annotations in public databases Exercises Custom sequence annotation

Module 3. Genomic data and annotations in public databases Exercises Custom sequence annotation Module 3. Genomic data and annotations in public databases Exercises Custom sequence annotation Objectives Upon completion of this exercise, you will be able to use the annotation pipelines provided by

More information

Bayesian Face Recognition Using Gabor Features

Bayesian Face Recognition Using Gabor Features Bayesian Face Recognition Using Gabor Features Xiaogang Wang, Xiaoou Tang Department of Information Engineering The Chinese University of Hong Kong Shatin, Hong Kong {xgwang1,xtang}@ie.cuhk.edu.hk Abstract

More information

Automatic Medical Coding of Patient Records via Weighted Ridge Regression

Automatic Medical Coding of Patient Records via Weighted Ridge Regression Sixth International Conference on Machine Learning and Applications Automatic Medical Coding of Patient Records via Weighted Ridge Regression Jian-WuXu,ShipengYu,JinboBi,LucianVladLita,RaduStefanNiculescuandR.BharatRao

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information

A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range

A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range Lae-Jeong Park and Jung-Ho Moon Department of Electrical Engineering, Kangnung National University Kangnung, Gangwon-Do,

More information

Predicting Disulfide Connectivity Patterns

Predicting Disulfide Connectivity Patterns 67:262 270 (2007) Predicting Disulfide Connectivity Patterns Chih-Hao Lu, 1 Yu-Ching Chen, 1 Chin-Sheng Yu, 2 and Jenn-Kang Hwang 1,2,3 * 1 Institute of Bioinformatics, National Chiao Tung University,

More information

PIB Ch. 18 Sequence Memory for Prediction, Inference, and Behavior. Jeff Hawkins, Dileep George, and Jamie Niemasik Presented by Jiseob Kim

PIB Ch. 18 Sequence Memory for Prediction, Inference, and Behavior. Jeff Hawkins, Dileep George, and Jamie Niemasik Presented by Jiseob Kim PIB Ch. 18 Sequence Memory for Prediction, Inference, and Behavior Jeff Hawkins, Dileep George, and Jamie Niemasik Presented by Jiseob Kim Quiz Briefly describe the neural activities of minicolumn in the

More information

The Human Behaviour-Change Project

The Human Behaviour-Change Project The Human Behaviour-Change Project Participating organisations A Collaborative Award funded by the www.humanbehaviourchange.org @HBCProject This evening Opening remarks from the chair Mary de Silva, The

More information

Statement of research interest

Statement of research interest Statement of research interest Milos Hauskrecht My primary field of research interest is Artificial Intelligence (AI). Within AI, I am interested in problems related to probabilistic modeling, machine

More information

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc. Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Topics Overview of Data Processing Pipeline Overview of Data Files 2 DNA Nano-Ball (DNB) Read Structure Genome : acgtacatgcattcacacatgcttagctatctctcgccag

More information

Quantitative Estimation of Movement Progress during Rehabilitation after Knee/Hip Replacement Surgery

Quantitative Estimation of Movement Progress during Rehabilitation after Knee/Hip Replacement Surgery Quantitative Estimation of Movement Progress during Rehabilitation after Knee/Hip Replacement Surgery by Roshanak Houmanfar A thesis presented to the University of Waterloo in fulfillment of the thesis

More information

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

SNPrints: Defining SNP signatures for prediction of onset in complex diseases SNPrints: Defining SNP signatures for prediction of onset in complex diseases Linda Liu, Biomedical Informatics, Stanford University Daniel Newburger, Biomedical Informatics, Stanford University Grace

More information

Handwriting - marker for Parkinson s Disease

Handwriting - marker for Parkinson s Disease Handwriting - marker for Parkinson s Disease P. Drotár et al. Signal Processing Lab Department of Telecommunications Brno University of Technology 3rd SPLab Workshop, 2013 P. Drotár et al. (Brno University

More information

Outlier Analysis. Lijun Zhang

Outlier Analysis. Lijun Zhang Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based

More information

EpiGRAPH regression: A toolkit for (epi-)genomic correlation analysis and prediction of quantitative attributes

EpiGRAPH regression: A toolkit for (epi-)genomic correlation analysis and prediction of quantitative attributes EpiGRAPH regression: A toolkit for (epi-)genomic correlation analysis and prediction of quantitative attributes by Konstantin Halachev Supervisors: Christoph Bock Prof. Dr. Thomas Lengauer A thesis submitted

More information

Breast Cancer Diagnosis Based on K-Means and SVM

Breast Cancer Diagnosis Based on K-Means and SVM Breast Cancer Diagnosis Based on K-Means and SVM Mengyao Shi UNC STOR May 4, 2018 Mengyao Shi (UNC STOR) Breast Cancer Diagnosis Based on K-Means and SVM May 4, 2018 1 / 19 Background Cancer is a major

More information