Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics
|
|
- Augusta Harper
- 6 years ago
- Views:
Transcription
1 Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics Karsten Borgwardt March 1 to March 12, 2010 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1
2 Why compare sequences? Protein sequences Proteins are chains of amino acids. 20 different types of amino acids can be found in protein sequences. Protein sequence changes over time by mutations, deletion, insertions. Different protein sequences may diverge from one common ancestor. Their sequences may differ slightly, yet their function is often conserved. Karsten Borgwardt: Data Mining in Bioinformatics, Page 2
3 Why compare sequences? Biological Question: Biologists are interested in the reverse direction: Given two protein sequences, is it likely that they originate from the same common ancestor? Computational Challenge: How to measure similarity between two protein sequence, or equivalently: How to measure similarity between two strings Kernel Challenge: How to measure similarity between two strings via a kernel function In short: How to define a string kernel Karsten Borgwardt: Data Mining in Bioinformatics, Page 3
4 History of sequence comparison First phase Smith-Waterman BLAST Second phase Profiles Hidden Markov Models Third phase PSI-Blast SAM-T98 Fourth phase Kernels Karsten Borgwardt: Data Mining in Bioinformatics, Page 4
5 Sequence comparison: Phase 1 Idea Methods Measure pairwise similarities between sequences with gaps Smith-Waterman dynamic programming high accuracy slow (O(n 2 )) BLAST faster heuristic alternative with sufficient accuracy searches common substrings of fixed length extends these in both directions performs gapped alignment Karsten Borgwardt: Data Mining in Bioinformatics, Page 5
6 Sequence comparison: Phase 2 Idea Methods Collect aggregate statistics from a family of sequences Compare this statistics to a single unlabeled protein Hidden Markov Models (HMMs) Markov process with hidden and observable parameters Forward algorithm determines probability if given sequence is output of particular HMM Profiles Profiles of sequence families are derived by multiple sequence alignment Given sequence is compared to this profile Karsten Borgwardt: Data Mining in Bioinformatics, Page 6
7 Sequence comparison: Phase 3 Idea Methods Create single models from database collections of homologous sequences PSI-BLAST Position specific iterative BLAST Profile from highest scoring hits in initial BLAST runs Position weighting according to degree of conservation Iteration of these steps SAM-T98, now SAM-T02 database search with HMM from multiple sequence alignment Karsten Borgwardt: Data Mining in Bioinformatics, Page 7
8 Phase 4: Kernels and SVMs General idea Model differences between classes of sequences Use SVM classifier to distinguish classes Use kernel to measure similarity between strings Kernels for Protein Sequences SVM-Fisher kernel Composite kernel Motif kernel String kernel Karsten Borgwardt: Data Mining in Bioinformatics, Page 8
9 SVM-Fisher method General idea Combine HMMs and SVMs for sequence classification Won best-paper award at ISMB 1999 Sequence representation fixed-length vector components are transition and emission probabilities transformation into Fisher score Karsten Borgwardt: Data Mining in Bioinformatics, Page 9
10 SVM-Fisher method Algorithm Model protein family F as HMM Transform query protein X into fixed-length vector via HMM Compute kernel between X and positive and negative examples of the protein family Advantages allows to incorporate prior knowledge allows to deal with missing data is interpretable outperforms competing methods Karsten Borgwardt: Data Mining in Bioinformatics, Page 10
11 Composition kernels General idea Model sequence by amino acid content Bin amino acids w.r.t physico-chemical properties Sequence representation feature vector of amino acid frequencies physico-chemical properties include predicted secondary structure, hydrophobicity, normalized van der Waals volume, polarity, polarizability useful database: AAindex Karsten Borgwardt: Data Mining in Bioinformatics, Page 11
12 Motif kernels General idea Conserved motif in amino acid sequences indicate structural and functional relationship Model sequence s as a feature vector f representing motifs i-th component of f is 1 s contains i-th motif Motif databases PROSITE emotifs BLOCKS+ combines several databases Generated by manual construction multiple sequence alignment Karsten Borgwardt: Data Mining in Bioinformatics, Page 12
13 Pairwise comparison kernels General idea Employ empirical kernel map on Smith-Waterman/Blast scores Advantage Utilizes decades of practical experience with Blast Disadvantage High computational cost (O(m 3 )) Alleviation Employ Blast instead of Smith-Waterman Use vectorization set for empirical map only Karsten Borgwardt: Data Mining in Bioinformatics, Page 13
14 Phase 4: String Kernels General idea Count common substrings in two strings A substring of length k is a k-mer Variations Assign weights to k-mers Allow for mismatches Allow for gaps Include substitutions Include wildcards Karsten Borgwardt: Data Mining in Bioinformatics, Page 14
15 Spectrum Kernel General idea For each l-mer α Σ l, the coordinate indexed by α will be the number of times α occurs in sequence x. Then the l-spectrum feature map is Φ Spectrum l (x) = (φ α (x)) α Σ l Here φ α (x) is the # occurrences of α in x. The spectrum kernel is now the inner product in the feature space defined by this map: k Spectrum (x, x ) =< Φ Spectrum l (x), Φ Spectrum l (x ) > Sequences are deemed the more similar, the more common substrings they contain Karsten Borgwardt: Data Mining in Bioinformatics, Page 15
16 Spectrum Kernel Principle Spectrum kernel: Count exactly common k-mers Karsten Borgwardt: Data Mining in Bioinformatics, Page 16
17 Mismatch Kernel General idea Do not enforce strictly exact matches Define mismatch neighborhood of an l-mer α with up to m mismatches: φ Mismatch (l,m) (α) = (φ β (α)) β Σ l For a sequence x of any length, the map is then extended as φ Mismatch (l,m) (x) = (φ Mismatch (l,m) (α)) l mers α in x The mismatch kernel is now the inner product in feature space defined by: k Mismatch (l,m) (x, x ) =< Φ Mismatch (l,m) (x), Φ Mismatch (l,m) (x ) > Karsten Borgwardt: Data Mining in Bioinformatics, Page 17
18 Mismatch Kernel Principle Mismatch kernel: Count common k-mers with max. m mismatches Karsten Borgwardt: Data Mining in Bioinformatics, Page 18
19 Gappy Kernel General idea Allow for gaps in common substrings subsequences A g-mer then contributes to all its l-mer subsequences φ Gap (g,l) (α) = (φ β(α)) β Σ l For a sequence x of any length, the map is then extended as φ Gap (g,l) (x) = (φ Gap (g,l) (α)) g mers α in x The gappy kernel is now the inner product in feature space defined by: k Gap (g,l) (x, x ) =< Φ Gap (g,l) (x), ΦGap (g,l) (x ) > Karsten Borgwardt: Data Mining in Bioinformatics, Page 19
20 Gappy Kernel Principle Gappy kernel: Count common l-subsequences of g- mers Karsten Borgwardt: Data Mining in Bioinformatics, Page 20
21 Substitution Kernel General idea mismatch neighborhood substitution neighborhood An l-mer then contributes to all l-mers in its substitution neighborhood l M (l,σ) (α) = {β = b 1 b 2... b l Σ l : log P (a i b i ) < σ} For a sequence x of any length, the map is then extended as φ Sub (l,σ) (x) = (φ Sub (l,σ) (α)) l mers α in x The substitution kernel is now: k(l,σ) Sub (x, x ) =< Φ Sub (l,σ)(x), ΦSub (l,σ) (x ) > i Karsten Borgwardt: Data Mining in Bioinformatics, Page 21
22 Substitution Kernel Principle Substitution kernel: Count common l-subsequences in substitution neighborhood Karsten Borgwardt: Data Mining in Bioinformatics, Page 22
23 Wildcard Kernels General idea augment alphabet Σ by a wildcard character Σ { } given α from Σ l and β from {Σ { }} l with maximum m occurrences of l-mer α contributes to l-mer β if their non-wildcard characters match For a sequence x of any length, the map is then given by φ W (l,m,λ) ildcard (x) = (φ β (α)) β W l mers α in x where φ β (α) = λ j if α matches pattern β containing j wildcards, φ β (α) = 0 if α does not match β, and 0 λ 1. Karsten Borgwardt: Data Mining in Bioinformatics, Page 23
24 Wildcard Kernel Principle Wildcard kernel: Count l-mers that match except for wildcards Karsten Borgwardt: Data Mining in Bioinformatics, Page 24
25 References and further reading References [1] C. Leslie, E. Eskin, and W. S. Noble. The spectrum kernel: A string kernel for SVM protein classification. In PSB, pages , [2] C. Leslie, E. Eskin, J. Weston, and W. S. Noble. Mismatch string kernels for SVM protein classification. In NIPS MIT Press. [3] C. Leslie and R. Kuang. Fast kernels for inexact string matching. In COLT, [4] B. Schölkopf, K. Tsuda, and J.-P. Vert. Kernel Methods in Computational Biology, Chapter 3 and 4. MIT Press, Cambridge, MA, Karsten Borgwardt: Data Mining in Bioinformatics, Page 25
Data Mining in Bioinformatics Day 4: Text Mining
Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1 What is text mining?
More informationData Mining in Bioinformatics Day 7: Clustering in Bioinformatics
Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 21 to March 4, 2011 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt:
More informationCS229 Final Project Report. Predicting Epitopes for MHC Molecules
CS229 Final Project Report Predicting Epitopes for MHC Molecules Xueheng Zhao, Shanshan Tuo Biomedical informatics program Stanford University Abstract Major Histocompatibility Complex (MHC) plays a key
More informationAnnotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation
Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Ryo Izawa, Naoki Motohashi, and Tomohiro Takagi Department of Computer Science Meiji University 1-1-1 Higashimita,
More informationComputational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project
Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene
More informationPrediction of Alternative Splice Sites in Human Genes
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2007 Prediction of Alternative Splice Sites in Human Genes Douglas Simmons San Jose State University
More informationBioinformatic analyses: methodology for allergen similarity search. Zoltán Divéki, Ana Gomes EFSA GMO Unit
Bioinformatic analyses: methodology for allergen similarity search Zoltán Divéki, Ana Gomes EFSA GMO Unit EFSA info session on applications - GMO Parma, Italy 28 October 2014 BIOINFORMATIC ANALYSES Analysis
More informationKernel Methods and String Kernels for Authorship Analysis
Kernel Methods and String Kernels for Authorship Analysis Notebook for PAN at CLEF 2012 Marius Popescu 1 and Cristian Grozea 2 1 University of Bucharest, Romania 2 Fraunhofer FOKUS, Berlin, Germany popescunmarius@gmail.com
More informationMultiple sequence alignment
Multiple sequence alignment Bas. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 18 th 2016 Protein alignments We have seen how to create a pairwise alignment of two sequences
More informationInfluenza Virus HA Subtype Numbering Conversion Tool and the Identification of Candidate Cross-Reactive Immune Epitopes
Influenza Virus HA Subtype Numbering Conversion Tool and the Identification of Candidate Cross-Reactive Immune Epitopes Brian J. Reardon, Ph.D. J. Craig Venter Institute breardon@jcvi.org Introduction:
More informationContents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network
Contents Classification Rule Generation for Bioinformatics Hyeoncheol Kim Rule Extraction from Neural Networks Algorithm Ex] Promoter Domain Hybrid Model of Knowledge and Learning Knowledge refinement
More informationBioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute
Bioinformatics Sequence Analysis: Part III. Pattern Searching and Gene Finding Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18
More informationLIPOPREDICT: Bacterial lipoprotein prediction server
www.bioinformation.net Server Volume 8(8) LIPOPREDICT: Bacterial lipoprotein prediction server S Ramya Kumari, Kiran Kadam, Ritesh Badwaik & Valadi K Jayaraman* Centre for Development of Advanced Computing
More informationA HMM-based Pre-training Approach for Sequential Data
A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University
More informationRumor Detection on Twitter with Tree-structured Recursive Neural Networks
1 Rumor Detection on Twitter with Tree-structured Recursive Neural Networks Jing Ma 1, Wei Gao 2, Kam-Fai Wong 1,3 1 The Chinese University of Hong Kong 2 Victoria University of Wellington, New Zealand
More informationA Universal Trend among Proteomes Indicates an Oily Last Common Ancestor. BI Journal Club Aleksander Sudakov
A Universal Trend among Proteomes Indicates an Oily Last Common Ancestor BI Journal Club 11.03.13 Aleksander Sudakov Used literature Ranjan V. Mannige, Charles L. Brooks, and Eugene I. Shakhnovich. 2012.
More informationColorspace & Matching
Colorspace & Matching Outline Color space and 2-base-encoding Quality Values and filtering Mapping algorithm and considerations Estimate accuracy Coverage 2 2008 Applied Biosystems Color Space Properties
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationA Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China
A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some
More informationUnderstanding eye movements in face recognition with hidden Markov model
Understanding eye movements in face recognition with hidden Markov model 1 Department of Psychology, The University of Hong Kong, Pokfulam Road, Hong Kong 2 Department of Computer Science, City University
More informationStatistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.
Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension
More informationPredicting Protein-Peptide Binding Affinity by Learning Peptide-Peptide Distance Functions
Predicting Protein-Peptide Binding Affinity by Learning Peptide-Peptide Distance Functions Chen Yanover and Tomer Hertz,2 School of Computer Science and Engineering 2 The Center for Neural Computation,
More informationStatistical analysis of RIM data (retroviral insertional mutagenesis) Bioinformatics and Statistics The Netherlands Cancer Institute Amsterdam
Statistical analysis of RIM data (retroviral insertional mutagenesis) Lodewyk Wessels Bioinformatics and Statistics The Netherlands Cancer Institute Amsterdam Viral integration Viral integration Viral
More informationHamby, Stephen Edward (2010) Data mining techniques for protein sequence analysis. PhD thesis, University of Nottingham.
Hamby, Stephen Edward (2010) Data mining techniques for protein sequence analysis. PhD thesis, University of Nottingham. Access from the University of Nottingham repository: http://eprints.nottingham.ac.uk/11498/1/sehthesis_corrected_003.pdf
More informationBCB 444/544 Fall 07 Dobbs 1
BCB 444/544 Lecture 19 A bit of: Protein Structure - Basics Protein Structure Visualization, & Comparison #19_Oct5 Required Reading (before lecture) Mon Oct 1 - Lecture 17 Protein Motifs & Domain Prediction
More informationMammogram Analysis: Tumor Classification
Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the
More informationPredicting Breast Cancer Recurrence Using Machine Learning Techniques
Predicting Breast Cancer Recurrence Using Machine Learning Techniques Umesh D R Department of Computer Science & Engineering PESCE, Mandya, Karnataka, India Dr. B Ramachandra Department of Electrical and
More informationGene Finding in Eukaryotes
Gene Finding in Eukaryotes Jan-Jaap Wesselink jjwesselink@cnio.es Computational and Structural Biology Group, Centro Nacional de Investigaciones Oncológicas Madrid, April 2008 Jan-Jaap Wesselink jjwesselink@cnio.es
More informationCS 4365: Artificial Intelligence Recap. Vibhav Gogate
CS 4365: Artificial Intelligence Recap Vibhav Gogate Exam Topics Search BFS, DFS, UCS, A* (tree and graph) Completeness and Optimality Heuristics: admissibility and consistency CSPs Constraint graphs,
More informationRisk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach
Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Manuela Zucknick Division of Biostatistics, German Cancer Research Center Biometry Workshop,
More informationUsing Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s
Using Bayesian Networks to Analyze Expression Data Xu Siwei, s0789023 Muhammad Ali Faisal, s0677834 Tejal Joshi, s0677858 Outline Introduction Bayesian Networks Equivalence Classes Applying to Expression
More informationName: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise!
Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise! 1. What process brought 2 divergent chlorophylls into the ancestor of the cyanobacteria,
More informationExploring Potential Discriminatory Information Embedded in PSSM to Enhance Protein Structural Class Prediction Accuracy
Exploring Potential Discriminatory Information Embedded in PSSM to Enhance Protein Structural Class Prediction Accuracy Abdollah Dehzangi 1,2, Kuldip Paliwal 1, James Lyons 1, Alok Sharma 3, and Abdul
More informationcloglog link function to transform the (population) hazard probability into a continuous
Supplementary material. Discrete time event history analysis Hazard model details. In our discrete time event history analysis, we used the asymmetric cloglog link function to transform the (population)
More informationSpeeding up Greedy Forward Selection for Regularized Least-Squares
Speeding up Greedy Forward Selection for Regularized Least-Squares Tapio Pahikkala, Antti Airola, and Tapio Salakoski Department of Information Technology University of Turku and Turku Centre for Computer
More informationEffective Diagnosis of Alzheimer s Disease by means of Association Rules
Effective Diagnosis of Alzheimer s Disease by means of Association Rules Rosa Chaves (rosach@ugr.es) J. Ramírez, J.M. Górriz, M. López, D. Salas-Gonzalez, I. Illán, F. Segovia, P. Padilla Dpt. Theory of
More informationMulti-atlas-based segmentation of the parotid glands of MR images in patients following head-and-neck cancer radiotherapy
Multi-atlas-based segmentation of the parotid glands of MR images in patients following head-and-neck cancer radiotherapy Guanghui Cheng, Jilin University Xiaofeng Yang, Emory University Ning Wu, Jilin
More informationSign Language Recognition System Using SIFT Based Approach
Sign Language Recognition System Using SIFT Based Approach Ashwin S. Pol, S. L. Nalbalwar & N. S. Jadhav Dept. of E&TC, Dr. BATU Lonere, MH, India E-mail : ashwin.pol9@gmail.com, nalbalwar_sanjayan@yahoo.com,
More informationHybrid HMM and HCRF model for sequence classification
Hybrid HMM and HCRF model for sequence classification Y. Soullard and T. Artières University Pierre and Marie Curie - LIP6 4 place Jussieu 75005 Paris - France Abstract. We propose a hybrid model combining
More informationPROCEEDINGS OF SPIE. Models of temporal enhanced ultrasound data for prostate cancer diagnosis: the impact of time-series order
PROCEEDINGS OF SPIE SPIEDigitalLibrary.org/conference-proceedings-of-spie Models of temporal enhanced ultrasound data for prostate cancer diagnosis: the impact of time-series order Layan Nahlawi Caroline
More informationMammogram Analysis: Tumor Classification
Mammogram Analysis: Tumor Classification Literature Survey Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is
More informationJ2.6 Imputation of missing data with nonlinear relationships
Sixth Conference on Artificial Intelligence Applications to Environmental Science 88th AMS Annual Meeting, New Orleans, LA 20-24 January 2008 J2.6 Imputation of missing with nonlinear relationships Michael
More informationStudy the Evolution of the Avian Influenza Virus
Designing an Algorithm to Study the Evolution of the Avian Influenza Virus Arti Khana Mentor: Takis Benos Rachel Brower-Sinning Department of Computational Biology University of Pittsburgh Overview Introduction
More informationCase-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials
Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Riccardo Miotto and Chunhua Weng Department of Biomedical Informatics Columbia University,
More informationLearning Convolutional Neural Networks for Graphs
GA-65449 Learning Convolutional Neural Networks for Graphs Mathias Niepert Mohamed Ahmed Konstantin Kutzkov NEC Laboratories Europe Representation Learning for Graphs Telecom Safety Transportation Industry
More informationClassıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network
UTM Computing Proceedings Innovations in Computing Technology and Applications Volume 2 Year: 2017 ISBN: 978-967-0194-95-0 1 Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon
More informationFor all of the following, you will have to use this website to determine the answers:
For all of the following, you will have to use this website to determine the answers: http://blast.ncbi.nlm.nih.gov/blast.cgi We are going to be using the programs under this heading: Answer the following
More informationTWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING
134 TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING H.F.S.M.Fonseka 1, J.T.Jonathan 2, P.Sabeshan 3 and M.B.Dissanayaka 4 1 Department of Electrical And Electronic Engineering, Faculty
More informationVIP: an integrated pipeline for metagenomics of virus
VIP: an integrated pipeline for metagenomics of virus identification and discovery Yang Li 1, Hao Wang 2, Kai Nie 1, Chen Zhang 1, Yi Zhang 1, Ji Wang 1, Peihua Niu 1 and Xuejun Ma 1 * 1. Key Laboratory
More informationMemory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports
Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Ramon Maldonado, BS, Travis Goodwin, PhD Sanda M. Harabagiu, PhD The University
More informationMutation Profile to Predict Tumor Stage in Lung Adenocarcinoma
Mutation Profile to Predict Tumor Stage in Lung Adenocarcinoma 1 st Calvin Kuo Mechanical Engineering Department Stanford University Stanford, USA calvink@stanford.edu Abstract Lung adenocarcinoma is among
More informationNature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.
Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze
More informationReal Time Sign Language Processing System
Real Time Sign Language Processing System Dibyabiva Seth (&), Anindita Ghosh, Ariruna Dasgupta, and Asoke Nath Department of Computer Science, St. Xavier s College (Autonomous), Kolkata, India meetdseth@gmail.com,
More informationIntroduction to Computational Neuroscience
Introduction to Computational Neuroscience Lecture 5: Data analysis II Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis II 6 Single
More informationRecognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks
Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks Caio R. Souza 1, Flavio F. Nobre 1, Priscila V.M. Lima 2, Robson M. Silva 2, Rodrigo M. Brindeiro 3, Felipe
More informationCHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY
64 CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY 3.1 PROBLEM DEFINITION Clinical data mining (CDM) is a rising field of research that aims at the utilization of data mining techniques to extract
More informationMachine Learning for Personalized Medicine
Department Biosystems Machine Learning for Personalized Medicine Karsten Borgwardt ETH Zürich Fraunhofer-Institut Kaiserslautern, September 30, 2016 The Need for Machine Learning in Computational Biology
More informationContents. 2 Statistics Static reference method Sampling reference set Statistics Sampling Types...
Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium http://www.computationalproteomics.com icelogo manual Niklaas Colaert
More informationAlgorithms in Nature. Pruning in neural networks
Algorithms in Nature Pruning in neural networks Neural network development 1. Efficient signal propagation [e.g. information processing & integration] 2. Robust to noise and failures [e.g. cell or synapse
More informationRas and Cell Signaling Exercise
Ras and Cell Signaling Exercise Learning Objectives In this exercise, you will use, a protein 3D- viewer, to explore: the structure of the Ras protein the active and inactive state of Ras and the amino
More information38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16
38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and
More informationBioinformatics Laboratory Exercise
Bioinformatics Laboratory Exercise Biology is in the midst of the genomics revolution, the application of robotic technology to generate huge amounts of molecular biology data. Genomics has led to an explosion
More informationTURKISH SIGN LANGUAGE RECOGNITION USING HIDDEN MARKOV MODEL
TURKISH SIGN LANGUAGE RECOGNITION USING HIDDEN MARKOV MODEL Kakajan Kakayev 1 and Ph.D. Songül Albayrak 2 1,2 Department of Computer Engineering, Yildiz Technical University, Istanbul, Turkey kkakajan@gmail.com
More informationCONSTRUCTION OF PHYLOGENETIC TREE USING NEIGHBOR JOINING ALGORITHMS TO IDENTIFY THE HOST AND THE SPREADING OF SARS EPIDEMIC
CONSTRUCTION OF PHYLOGENETIC TREE USING NEIGHBOR JOINING ALGORITHMS TO IDENTIFY THE HOST AND THE SPREADING OF SARS EPIDEMIC 1 MOHAMMAD ISA IRAWAN, 2 SITI AMIROCH 1 Institut Teknologi Sepuluh Nopember (ITS)
More informationCorrelogram Method for Comparing Bio-Sequences
Correlogram Method for Comparing Bio-Sequences Gandhali P. Samant, and Debasis Mitra dmitra@cs.fit.edu Technical Report FIT-CS-2006-01 Content of a Master s Thesis Submitted to Florida Institute of Technology
More informationCost-sensitive Dynamic Feature Selection
Cost-sensitive Dynamic Feature Selection He He 1, Hal Daumé III 1 and Jason Eisner 2 1 University of Maryland, College Park 2 Johns Hopkins University June 30, 2012 He He, Hal Daumé III and Jason Eisner
More informationPredicting Sleep Using Consumer Wearable Sensing Devices
Predicting Sleep Using Consumer Wearable Sensing Devices Miguel A. Garcia Department of Computer Science Stanford University Palo Alto, California miguel16@stanford.edu 1 Introduction In contrast to the
More informationHands-On Ten The BRCA1 Gene and Protein
Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such
More informationData mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis
Data mining for Obstructive Sleep Apnea Detection 18 October 2017 Konstantinos Nikolaidis Introduction: What is Obstructive Sleep Apnea? Obstructive Sleep Apnea (OSA) is a relatively common sleep disorder
More informationYeast Cells Classification Machine Learning Approach to Discriminate Saccharomyces cerevisiae Yeast Cells Using Sophisticated Image Features.
Yeast Cells Classification Machine Learning Approach to Discriminate Saccharomyces cerevisiae Yeast Cells Using Sophisticated Image Features. Mohamed Tleis Supervisor: Fons J. Verbeek Leiden University
More informationarxiv: v2 [q-bio.pe] 21 Jan 2008
Viral population estimation using pyrosequencing Nicholas Eriksson 1,, Lior Pachter 2, Yumi Mitsuya 3, Soo-Yon Rhee 3, Chunlin Wang 3, Baback Gharizadeh 4, Mostafa Ronaghi 4, Robert W. Shafer 3, and Niko
More information1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1
1. INTRODUCTION Sign language interpretation is one of the HCI applications where hand gesture plays important role for communication. This chapter discusses sign language interpretation system with present
More informationUnsupervised Identification of Isotope-Labeled Peptides
Unsupervised Identification of Isotope-Labeled Peptides Joshua E Goldford 13 and Igor GL Libourel 124 1 Biotechnology institute, University of Minnesota, Saint Paul, MN 55108 2 Department of Plant Biology,
More informationPersons Personality Traits Recognition using Machine Learning Algorithms and Image Processing Techniques
Persons Personality Traits Recognition using Machine Learning Algorithms and Image Processing Techniques Kalani Ilmini 1 and TGI Fernando 2 1 Department of Computer Science, University of Sri Jayewardenepura,
More informationComputer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California
Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface
More informationStepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework
Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Thomas E. Rothenfluh 1, Karl Bögl 2, and Klaus-Peter Adlassnig 2 1 Department of Psychology University of Zurich, Zürichbergstraße
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017
RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science
More informationProject PRACE 1IP, WP7.4
Project PRACE 1IP, WP7.4 Plamenka Borovska, Veska Gancheva Computer Systems Department Technical University of Sofia The Team is consists of 5 members: 2 Professors; 1 Assist. Professor; 2 Researchers;
More informationIdentification of single de novo drug candidate for dengue and filaria on Aedes aegypti and Culex quinquefasciatus mosquitoes using insilico Protocols
Available online at www.ijntps.org ISSN: 2277 2782 INTERNATIONAL JOURNAL OF NOVEL TRENDS IN PHARMACEUTICAL SCIENCES RESEARCH ARTICLE Identification of single de novo drug candidate for dengue and filaria
More informationSign Language to Number by Neural Network
Sign Language to Number by Neural Network Shekhar Singh Assistant Professor CSE, Department PIET, samalkha, Panipat, India Pradeep Bharti Assistant Professor CSE, Department PIET, samalkha, Panipat, India
More informationA micropower support vector machine based seizure detection architecture for embedded medical devices
A micropower support vector machine based seizure detection architecture for embedded medical devices The MIT Faculty has made this article openly available. Please share how this access benefits you.
More informationChanges to Biochemistry (4th ed.), 2nd Printing
Changes to Biochemistry (4th ed.), 2nd Printing 1. p. vi, line 7. Change: hydrophobic to hydrophilic. Make the same change on the back cover. 2. p. xx, at the end of the Chap. 17 portion. Add the line:
More informationCISC453 Winter Probabilistic Reasoning Part B: AIMA3e Ch
CISC453 Winter 2010 Probabilistic Reasoning Part B: AIMA3e Ch 14.5-14.8 Overview 2 a roundup of approaches from AIMA3e 14.5-14.8 14.5 a survey of approximate methods alternatives to the direct computing
More informationMachine Learning Applied to Perception: Decision-Images for Gender Classification
Machine Learning Applied to Perception: Decision-Images for Gender Classification Felix A. Wichmann and Arnulf B. A. Graf Max Planck Institute for Biological Cybernetics Tübingen, Germany felix.wichmann@tuebingen.mpg.de
More informationModule 3. Genomic data and annotations in public databases Exercises Custom sequence annotation
Module 3. Genomic data and annotations in public databases Exercises Custom sequence annotation Objectives Upon completion of this exercise, you will be able to use the annotation pipelines provided by
More informationBayesian Face Recognition Using Gabor Features
Bayesian Face Recognition Using Gabor Features Xiaogang Wang, Xiaoou Tang Department of Information Engineering The Chinese University of Hong Kong Shatin, Hong Kong {xgwang1,xtang}@ie.cuhk.edu.hk Abstract
More informationAutomatic Medical Coding of Patient Records via Weighted Ridge Regression
Sixth International Conference on Machine Learning and Applications Automatic Medical Coding of Patient Records via Weighted Ridge Regression Jian-WuXu,ShipengYu,JinboBi,LucianVladLita,RaduStefanNiculescuandR.BharatRao
More informationEECS 433 Statistical Pattern Recognition
EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern
More informationA Learning Method of Directly Optimizing Classifier Performance at Local Operating Range
A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range Lae-Jeong Park and Jung-Ho Moon Department of Electrical Engineering, Kangnung National University Kangnung, Gangwon-Do,
More informationPredicting Disulfide Connectivity Patterns
67:262 270 (2007) Predicting Disulfide Connectivity Patterns Chih-Hao Lu, 1 Yu-Ching Chen, 1 Chin-Sheng Yu, 2 and Jenn-Kang Hwang 1,2,3 * 1 Institute of Bioinformatics, National Chiao Tung University,
More informationPIB Ch. 18 Sequence Memory for Prediction, Inference, and Behavior. Jeff Hawkins, Dileep George, and Jamie Niemasik Presented by Jiseob Kim
PIB Ch. 18 Sequence Memory for Prediction, Inference, and Behavior Jeff Hawkins, Dileep George, and Jamie Niemasik Presented by Jiseob Kim Quiz Briefly describe the neural activities of minicolumn in the
More informationThe Human Behaviour-Change Project
The Human Behaviour-Change Project Participating organisations A Collaborative Award funded by the www.humanbehaviourchange.org @HBCProject This evening Opening remarks from the chair Mary de Silva, The
More informationStatement of research interest
Statement of research interest Milos Hauskrecht My primary field of research interest is Artificial Intelligence (AI). Within AI, I am interested in problems related to probabilistic modeling, machine
More informationDr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.
Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Topics Overview of Data Processing Pipeline Overview of Data Files 2 DNA Nano-Ball (DNB) Read Structure Genome : acgtacatgcattcacacatgcttagctatctctcgccag
More informationQuantitative Estimation of Movement Progress during Rehabilitation after Knee/Hip Replacement Surgery
Quantitative Estimation of Movement Progress during Rehabilitation after Knee/Hip Replacement Surgery by Roshanak Houmanfar A thesis presented to the University of Waterloo in fulfillment of the thesis
More informationSNPrints: Defining SNP signatures for prediction of onset in complex diseases
SNPrints: Defining SNP signatures for prediction of onset in complex diseases Linda Liu, Biomedical Informatics, Stanford University Daniel Newburger, Biomedical Informatics, Stanford University Grace
More informationHandwriting - marker for Parkinson s Disease
Handwriting - marker for Parkinson s Disease P. Drotár et al. Signal Processing Lab Department of Telecommunications Brno University of Technology 3rd SPLab Workshop, 2013 P. Drotár et al. (Brno University
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More informationEpiGRAPH regression: A toolkit for (epi-)genomic correlation analysis and prediction of quantitative attributes
EpiGRAPH regression: A toolkit for (epi-)genomic correlation analysis and prediction of quantitative attributes by Konstantin Halachev Supervisors: Christoph Bock Prof. Dr. Thomas Lengauer A thesis submitted
More informationBreast Cancer Diagnosis Based on K-Means and SVM
Breast Cancer Diagnosis Based on K-Means and SVM Mengyao Shi UNC STOR May 4, 2018 Mengyao Shi (UNC STOR) Breast Cancer Diagnosis Based on K-Means and SVM May 4, 2018 1 / 19 Background Cancer is a major
More information