Midterm project (Part 2) Due: Monday, November 5, 2018

Size: px
Start display at page:

Download "Midterm project (Part 2) Due: Monday, November 5, 2018"

Transcription

1 University of Pittsburgh CS3750 Advanced Topics in Machine Learning Professor Milos Hauskrecht Midterm project (Part 2) Due: Monday, November 5, 2018 The objective of the midterm project is to gain experience working with machine learning methods covered in the first half of the course. The project will be conducted on data derived from electronic health records (EHRs). You will implement (with the help of existing libraries) methods for analyzing EHRs using (1) SVD, (2) LDA, and (3) CBOW models. You will analyze the data with these models in order to demonstrate their benefits in understanding the different patient cases, their similarities and differences, and their value in supporting various prediction tasks. Teams The project is a group project. There will be 5 teams of four students each. One group has five members. The groups are posted on the course web site. The groups should work on the problems and solutions independently. The joint development of the solutions is strictly prohibited!!! Problem The data we analyze in this project consist of sequences of events logged in time that correspond to various events (or activities) related to patient care. These include lab tests, medications given and procedures done on patients. There are multiple sequences, each corresponding to a different patient. While the LDA, SVD or CBOW models were originally designed for document and text analysis one can view different data objects (in our case patient cases) as documents and the basic entities that form these objects as terms or words. Briefly, a patient case can be thought of as a document, and events related to patient care are words. One of the hypothesis to be studied in this project is that the sequences of observed events are related to the problems the patients suffer from (e.g. diseases, complications) in a similar way the documents may be represented by the different topics. Another hypothesis is that co-occurring events are semantically related and that this relation can be expressed to define meaningful semantic distances among events, much like word distances induced by the word2vec models when applied to texts. Data The data for the project are stored in the departmental database server: cs-mysql-01.cs.pitt.edu The database name is CS3750_ DB. The data are derived from MIMIC-III dataset of ICU patient cases that is distributed by physionet.org. The data in CS3750_ DB database reside in five different tables. The first table (ADMISSIONS) enumerates patient instances (or patient cases). There are 21,571 patient instances in the database. Each instance in the ADMISSION table is associated with four columns with binary entries that reflect some aspects of the case, indicating whether the patient is (1) a trauma patient, (2) was admitted due to

2 or experienced a heart-related condition, (3) developed sepsis, and (4) is a transplant patient. We use these entries for the external validation of SVD and LDA approaches. The second table (EVENTS) lists events related to labs and medications observed for individual patients and times of these events. Please note, that times of patient admissions were shifted so that the first day of every admission is January 1, All events are coded. The third table (EVENT_LIST) is the dictionary that describes the meaning of event codes. The fourth table (PROCEDURES) gives a list of special events - procedures for every patient case (these cover, for example, surgeries). These events are not logged in time. They are provided on the patient level only. Also note that one patient can have more procedures. The fifth table (PROCEDURE_LIST) defines the meaning of the procedures codes. Data access and rules for working with the data The data you will work with were prepared for the project purposes and should not be publicly distributed or copied in any way and should reside only on the servers. All of you will need to sign the agreement form outlining the rules for working with the data including a no distribution clause. Tasks Our objective is to study lower dimensional representations of patient cases that can define similarities among patients and events. We start our investigations with the Singular value decomposition (SVD) methods. Task 1. SVD analysis SVD starts from the term-document matrix formed by the bag of words (vector-based) representation of events observed for the different patients and their counts. Alternatively, it can be formed by their tf-idf scores. The SVD finds a decomposition of the term-document matrix that is restricted by the internal dimension k. The internal dimension defines the latent space for the documents/words. The matrices forming the decomposition represent patients (documents) and events (words) in the k-dimensional latent space. The similarities among patients (documents) can be calculated by transforming the documents using the right eigenvectors defining the SVD decomposition (multiplied by the matrix of singular values) and the cosine metrics. Similarly, the similarity among events (words) can be calculated using the left eigenvectors defining the SVD decomposition and the cosine metric. Part 1.a. Get target patient instances from the ADMISSION table assigned to your group (See Appendix 1) and target events from the EVENT_LIST and PROCEDURELIST tables assigned to your group (See Appendix 2). Part 1.b. Run SVD with internal dimensionality k=20 such that instances are defined by events in EVENTS and PROCEDURES tables. For every target instance, identify five instances that are most similar to it based on the SVD. Analyze the differences between the target and best matching instances in terms of: (1) the events and procedures associated with these instances, (2) four attributes/labels associated with the instances in the ADMISSION table. Do most similar instances to the targets make sense? Please explain and discuss your conclusions. Part 1.c. For every target event Identify 10 events from (EVENT_LIST or PROCEDURE_LIST) most similar to it. Research the relations between these events by searching the web. Are there any associations

3 between the events/procedures you think make or do not make sense? Please, analyze the problem and explain your conclusions. Part 1.d. In this part, our goal is to validate the usefulness of the lower dimensional representation of patient cases using labels assigned to the instances in the ADMISSION table. Propose and implement a method for validation of the lower dimensional representation x of patient instances x as defined by the SVD decomposition and their ability to predict the labels. Explain and justify your approach. Analyze the results using the proposed approach and compare it to a simple method that assumes the labels are assigned to instances randomly and without considering any instance-related information. Part 1.e. Research techniques to determine the dimensionality k of the latent space for the SVD. Report your findings. Repeat the analysis in Parts 1.b- 1.d by attempting to find a better choice of k. Compare the results of analysis for the 20-dimensional and the new optimized k-dimensional space. Discuss the results and write your conclusions. Part 1.f. In all previous analyses we used the data that consisted of events recorded in both EVENTS and PROCEDURE tables. Consider restricting the analysis only to the events recorded in the EVENTS table. Repeat the SVD analysis in Parts 1.b-1.e., but omit procedure events. Determine whether the events in the PROCEDURE table are important for defining the latent space and improve the results, that is, compare the results obtained with and without procedures in the representation of the patient case. Task 2. LDA analysis An alternative method to define and analyze a low-dimensional representation of patient cases (documents) is the Latent Dirichlet Allocation (LDA). Briefly, LDA defines a generative probabilistic model of patient cases. The lower dimensional representation of instances is represented using the distribution over topics. The similarities among documents are determined in terms of distance measures for two distributions. Two common distance choices are Jensen-Shannon (a symmetric version of the KL-divergence) and Hellinger distances. Part 2.a. Consider the same set of target patient instances from the ADMISSION table assigned to your group in Appendix 1 as used in Part 1.a, and the same set of target events from the EVENT_LIST and PROCEDURE LIST tables assigned to you in Appendix 2. Part 2.b. Run the LDA with internal dimensionality k=20 where instances are defined by events in EVENTS and PROCEDURES tables. Similarly, to Part 1.b. for every target instance find five instances that are most similar (closest) to it. Analyze the differences between the target and the best matching instances in terms of: (1) the events and procedures associated with these instances, (2) four attributes/labels associated with the instances in the ADMISSION table. Do the closest instances to the targets make sense? Compare the closest instances picked by the LDA to the best instances based on the SVD. Analyze and report the differences. Is one of them better than the other one? Discuss the results. Part 2.c. One of the components of the LDA model is the conditional probability distribution of events (words) given the topic. Report 10 most important events (words) defining individual topics based on

4 the LDA model trained on the patient data. By searching the web try to assess if the events most important for each topic are related and whether their combination makes sense. Discuss the results. Part 2.d. Explain, how would you define the similarities/distances in between the two words in the LDA model. For every target event apply the approach to identify 10 events most similar to it, similarly to Part 1.c. for the SVD. Research the relations between these events by searching the web resources. Compare the results to results in Part 1.c and discuss the differences. Part 2.e. Validate the usefulness of the lower dimensional representation of patient cases based on LDA using labels assigned to the instances in the ADMISSION table. Replicate the approach for external validation from Part 1.d. Compare the results obtained for SVD and LDA models. Discuss the results. Part 2.f. Select/pick new dimensionality k (> 20) for the topic space and repeat the analysis in Parts 2.b- 2.e. Compare the results. Report your findings. Are any topics (CPTs for words given the topic) comparable to topics based on the 20-dimensional topic model? Part 2.g Restrict the LDA analysis by considering only the events recorded in the EVENTS table. Repeat the LDA analysis with 20-dimensional topic space in Parts 2.b-2.e. but omit procedures. Analyze and compare the differences in results for Parts 2.b-2.e. Use these analyses to decide if inclusion of procedures in the representation matters. Also compare and discuss the corresponding results you have obtained for the SVD. Task 3. CBOW analysis SVD and LDA analyses work with complete patient cases (documents) to derive similarities between the patients, as well as, similarities between events. CBOW Is a method that aims to extract similarities in between words (events) only. It does this by considering local relations in between events, that is, events that co-occur close in time. As a result, the latent space is expected to capture mostly local relations among events. Part 3.a. Consider the first four target events assigned to your group in Appendix 2. Part 3.b. Run the CBOW model on the event data in the EVENTS table. (Ignore events in the PROCEDURES table). Consider the latent spaces of dimension 20 and 50. Propose a solution of how to define a local window. Part 3.c. For every target event Identify 10 events most similar to it based on the low dimensional embedding defined by the CBOW model. Research the relations between these events by searching the web. Are there any associations you think make or do not make sense? Are these different from the similarities derived in Part 1.f and Part 2.g.? Discuss the differences. Which approach do you think makes most sense for extracting meaningful relations among events. Task 4. Extensions challenge Think about the different ways of extending SVD, LDA or CBOW analyses of data in Electronic Health Records to slightly different or new problems. Pick one problem you thing is the most interesting. Clearly explain the problem and outline the idea of your solution/extension. Justify why the problem and the solution could be interesting for analysis of EHRs. Implementation and preliminary experiments supporting your idea are not necessary but if given, they would earn you extra credit.

5 Methods and libraries The project expects you to take full advantage of existing libraries implementing various machine learning methods. The ML researchers typically resort to python and python libraries. There are different libraries available to support SVD, LDA and CBOW methodologies. One library that covers all three methods we plan to experiment with is gensim library -- so looking at it, is definitely a good start. Elements machines and python. Five elements machines: Arsenic, Germanium, Hydrogen, Oxygen, and Selenium work with and have python3.5. installed. You can run it using either virtualenv or conda. If you do not use them, run python3.5 by including the version number. Timeline Monday, November 6, 2018 (noon): Project reports and programs are due Tuesday, November 7, 2018 (11:00am): Project presentations (15 minutes per group) Appendices Appendix 1. Target patient instances to be considered by the groups when analyzing the similarity between patients. Group 1: ADM_ID: 172 ADM_ID: 752 ADM_ID: 1769 ADM_ID: 2446 ADM_ID: 3857 Group 2: ADM_ID: 407 ADM_ID: 802 ADM_ID: 1868 ADM_ID: 2589 ADM_ID: 4320 Group 3: ADM_ID: 503 ADM_ID: 896 ADM_ID: 1983

6 ADM_ID: 3092 ADM_ID: 3966 Group 4: ADM_ID: 657 ADM_ID: 993 ADM_ID: 2157 ADM_ID: 3259 ADM_ID: 4074 Group 5: ADM_ID: 731 ADM_ID: 1619 ADM_ID: 2266 ADM_ID: 3775 ADM_ID: 4247 Appendix 2. Target events to be considered by the different groups when analyzing the similarity between events. Group 1: 69 Antibiotics: Vancomycin 375 Chemistry: Blood: Troponin I 638 Hematology: Blood: Platelet Count 37 Medications: Nitroprusside 1035 Operations On Valves And Septa Of Heart Group 2: 26 Medications: Furosemide 31 Medications: Norepinephrine 282 Chemistry: Blood: CK-MB Index 629 Hematology: Blood: Neutrophils 1036 Operations On Vessels Of Heart Group 3: 24 Medications: Fentanyl 10 Medications: Amiodarone 617 Hematology: Blood: Lymphocytes 278 Chemistry: Blood: Cholesterol, HDL 1055 Operations On Kidney

7 Group 4: 54 Medications: Heparin Sodium 627 Hematology: Blood: Monocytes 610 Hematology: Blood: INR(PT) 55 Medications: Labetalol 1050 Operations On Liver Group 5: 42 Medications: Vasopressin 648 Hematology: Blood: PTT 621 Hematology: Blood: MCH 286 Chemistry: Blood: Creatinine 1045 Incision, Excision, And Anastomosis Of Intestine

SEPTIC SHOCK PREDICTION FOR PATIENTS WITH MISSING DATA. Joyce C Ho, Cheng Lee, Joydeep Ghosh University of Texas at Austin

SEPTIC SHOCK PREDICTION FOR PATIENTS WITH MISSING DATA. Joyce C Ho, Cheng Lee, Joydeep Ghosh University of Texas at Austin SEPTIC SHOCK PREDICTION FOR PATIENTS WITH MISSING DATA Joyce C Ho, Cheng Lee, Joydeep Ghosh University of Texas at Austin WHAT IS SEPSIS AND SEPTIC SHOCK? Sepsis is a systemic inflammatory response to

More information

Inferring Clinical Correlations from EEG Reports with Deep Neural Learning

Inferring Clinical Correlations from EEG Reports with Deep Neural Learning Inferring Clinical Correlations from EEG Reports with Deep Neural Learning Methods for Identification, Classification, and Association using EHR Data S23 Travis R. Goodwin (Presenter) & Sanda M. Harabagiu

More information

Statement of research interest

Statement of research interest Statement of research interest Milos Hauskrecht My primary field of research interest is Artificial Intelligence (AI). Within AI, I am interested in problems related to probabilistic modeling, machine

More information

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering Kunal Sharma CS 4641 Machine Learning Abstract Clustering is a technique that is commonly used in unsupervised

More information

CSE 255 Assignment 9

CSE 255 Assignment 9 CSE 255 Assignment 9 Alexander Asplund, William Fedus September 25, 2015 1 Introduction In this paper we train a logistic regression function for two forms of link prediction among a set of 244 suspected

More information

Machine Learning Analysis of Medical Marijuana Strains Thomas Boser /23/2014

Machine Learning Analysis of Medical Marijuana Strains Thomas Boser /23/2014 Machine Learning Analysis of Medical Marijuana Strains Thomas Boser --- thomasboser@gmail.com --- 08/23/2014 Table of Contents: I- Problem Setting II- Data Pre-Processing III- Preliminary Analysis IV-

More information

A Novel Temporal Similarity Measure for Patients Based on Irregularly Measured Data in Electronic Health Records

A Novel Temporal Similarity Measure for Patients Based on Irregularly Measured Data in Electronic Health Records Accepted for publication in the proceedings of ACM-BCB 2016 A Novel Temporal Similarity Measure for Patients Based on Irregularly Measured Data in Electronic Health Records Ying Sha School of Biology,

More information

Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters

Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters Weiyi Shang, Ahmed E. Hassan Software Analysis and Intelligence Lab (SAIL) Queen s University, Kingston,

More information

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition

More information

Incorporating Word Correlation Knowledge into Topic Modeling. Pengtao Xie. Joint work with Diyi Yang and Eric Xing

Incorporating Word Correlation Knowledge into Topic Modeling. Pengtao Xie. Joint work with Diyi Yang and Eric Xing Incorporating Word Correlation Knowledge into Topic Modeling Pengtao Xie Joint work with Diyi Yang and Eric Xing 1 Outline Motivation Incorporating Word Correlation Knowledge into Topic Modeling Experiments

More information

A Novel Temporal Similarity Measure for Patients Based on Irregularly Measured Data in Electronic Health Records

A Novel Temporal Similarity Measure for Patients Based on Irregularly Measured Data in Electronic Health Records ACM-BCB 2016 A Novel Temporal Similarity Measure for Patients Based on Irregularly Measured Data in Electronic Health Records Ying Sha, Janani Venugopalan, May D. Wang Georgia Institute of Technology Oct.

More information

Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and

Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and Pilot Study: Clinical Trial Task Ontology Development Introduction A prototype ontology of common participant-oriented clinical research tasks and events was developed using a multi-step process as summarized

More information

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Concepts and Categories

Concepts and Categories Concepts and Categories Functions of Concepts By dividing the world into classes of things to decrease the amount of information we need to learn, perceive, remember, and recognise: cognitive economy They

More information

General Instructions:

General Instructions: CSCE 110: Programming I Spring 2019 Lab 4 General Instructions: Lab is due online by 11:59 pm of the due date. The assignment must be typed, not handwritten or scanned. Label your Python programs q.py,

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Conditional Outlier Detection for Clinical Alerting

Conditional Outlier Detection for Clinical Alerting Conditional Outlier Detection for Clinical Alerting Milos Hauskrecht, PhD 1, Michal Valko, MSc 1, Iyad Batal, MSc 1, Gilles Clermont, MD, MS 2, Shyam Visweswaran MD, PhD 3, Gregory F. Cooper, MD, PhD 3

More information

REFERENCE INTERVALS. Units Canine Feline Bovine Equine Porcine Ovine

REFERENCE INTERVALS. Units Canine Feline Bovine Equine Porcine Ovine REFERENCE INTERVALS Biochemistry Units Canine Feline Bovine Equine Porcine Ovine Sodium mmol/l 144-151 149-156 135-151 135-148 140-150 143-151 Potassium mmol/l 3.9-5.3 3.3-5.2 3.9-5.9 3.0-5.0 4.7-7.1 4.6-7.0

More information

LOW-RANK DECOMPOSITION AND LOGISTIC REGRESSION METHODS FOR LINK PREDICTION IN TERRORIST NETWORKS CSE 293 MS PROJECT REPORT, FALL 2010.

LOW-RANK DECOMPOSITION AND LOGISTIC REGRESSION METHODS FOR LINK PREDICTION IN TERRORIST NETWORKS CSE 293 MS PROJECT REPORT, FALL 2010. LOW-RANK DECOMPOSITION AND LOGISTIC REGRESSION METHODS FOR LINK PREDICTION IN TERRORIST NETWORKS CSE 293 MS PROJECT REPORT, FALL 2010 Eric Doi ekdoi@cs.ucsd.edu University of California, San Diego ABSTRACT

More information

The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care

The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care SUPPLEMENTARY INFORMATION Articles https://doi.org/10.1038/s41591-018-0213-5 In the format provided by the authors and unedited. The Artificial Intelligence Clinician learns optimal treatment strategies

More information

Clustering mass spectrometry data using order statistics

Clustering mass spectrometry data using order statistics Proteomics 2003, 3, 1687 1691 DOI 10.1002/pmic.200300517 1687 Douglas J. Slotta 1 Lenwood S. Heath 1 Naren Ramakrishnan 1 Rich Helm 2 Malcolm Potts 3 1 Department of Computer Science 2 Department of Wood

More information

Sepsis. National Clinical Guideline Centre. Sepsis: the recognition, diagnosis and management of sepsis. NICE guideline <number> January 2016

Sepsis. National Clinical Guideline Centre. Sepsis: the recognition, diagnosis and management of sepsis. NICE guideline <number> January 2016 National Clinical Guideline Centre Consultation Sepsis Sepsis: the recognition, diagnosis and management of sepsis NICE guideline Appendices I-O January 2016 Draft for consultation Commissioned

More information

Data Mining in Bioinformatics Day 4: Text Mining

Data Mining in Bioinformatics Day 4: Text Mining Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1 What is text mining?

More information

Zainab M. AlQenaei. Dissertation Defense University of Colorado at Boulder Leeds School of Business Operations and Information Management Division

Zainab M. AlQenaei. Dissertation Defense University of Colorado at Boulder Leeds School of Business Operations and Information Management Division An Investigation of the Relationship between Consumer Mental Health Recovery Indicators and Clinicians Reports Using Multivariate Analyses of the Singular Value Decomposition of a Textual Corpus Zainab

More information

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U T H E U N I V E R S I T Y O F T E X A S A T D A L L A S H U M A N L A N

More information

UvA-DARE (Digital Academic Repository)

UvA-DARE (Digital Academic Repository) UvA-DARE (Digital Academic Repository) A classification model for the Leiden proteomics competition Hoefsloot, H.C.J.; Berkenbos-Smit, S.; Smilde, A.K. Published in: Statistical Applications in Genetics

More information

Recommendation of CGM novels considering serendipity.

Recommendation of CGM novels considering serendipity. CGM Recommendation of CGM novels considering serendipity. 1 1 1 1 Kyushu University Abstract: Recently, CGM (Consumer Generated Media) services become popular. Although a huge amount of contents have been

More information

NMF-Density: NMF-Based Breast Density Classifier

NMF-Density: NMF-Based Breast Density Classifier NMF-Density: NMF-Based Breast Density Classifier Lahouari Ghouti and Abdullah H. Owaidh King Fahd University of Petroleum and Minerals - Department of Information and Computer Science. KFUPM Box 1128.

More information

Assignment 5: Integrative epigenomics analysis

Assignment 5: Integrative epigenomics analysis Assignment 5: Integrative epigenomics analysis Due date: Friday, 2/24 10am. Note: no late assignments will be accepted. Introduction CpG islands (CGIs) are important regulatory regions in the genome. What

More information

The use of Topic Modeling to Analyze Open-Ended Survey Items

The use of Topic Modeling to Analyze Open-Ended Survey Items The use of Topic Modeling to Analyze Open-Ended Survey Items W. Holmes Finch Maria E. Hernández Finch Constance E. McIntosh Claire Braun Ball State University Open ended survey items Researchers making

More information

Chapter 9. Tests, Procedures, and Diagnosis Codes The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 9. Tests, Procedures, and Diagnosis Codes The McGraw-Hill Companies, Inc. All rights reserved. Chapter 9 Tests, Procedures, and Diagnosis Codes Chapter 9 Content: Overview Ordering A Test SpringLabsTM & Reference Lab Results Managing and Charting Tests Creating A New Test Documenting and Activating

More information

Smarter Big Data for a Healthy Pennsylvania: Changing the Paradigm of Healthcare

Smarter Big Data for a Healthy Pennsylvania: Changing the Paradigm of Healthcare Smarter Big Data for a Healthy Pennsylvania: Changing the Paradigm of Healthcare By: Alejandro Borgonovo Mentor: Dr. Amol Navathe Outline Project Overview Project Significance Objectives Methods About

More information

Sound Texture Classification Using Statistics from an Auditory Model

Sound Texture Classification Using Statistics from an Auditory Model Sound Texture Classification Using Statistics from an Auditory Model Gabriele Carotti-Sha Evan Penn Daniel Villamizar Electrical Engineering Email: gcarotti@stanford.edu Mangement Science & Engineering

More information

UKParl: A Data Set for Topic Detection with Semantically Annotated Text

UKParl: A Data Set for Topic Detection with Semantically Annotated Text UKParl: A Data Set for Topic Detection with Semantically Annotated Text Federico Nanni, Mahmoud Osman, Yi-Ru Cheng, Simone Paolo Ponzetto and Laura Dietz My Research Post-Doc in computational social science

More information

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU Hani Tamim, PhD Clinical Research Institute Department of Internal Medicine American University of Beirut Medical Center Beirut - Lebanon Participant

More information

CIS192 Python Programming

CIS192 Python Programming CIS192 Python Programming Scientific Computing Eric Kutschera University of Pennsylvania March 20, 2015 Eric Kutschera (University of Pennsylvania) CIS 192 March 20, 2015 1 / 28 Course Feedback Let me

More information

An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework

An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework Soumya GHOSE, Jhimli MITRA 1, Sankalp KHANNA 1 and Jason DOWLING 1 1. The Australian e-health and

More information

Review Questions for Exam 2 Math 263

Review Questions for Exam 2 Math 263 Review Questions for Exam 2 Math 263 1. If you draw an M&M candy at random from a bag of the candies, the candy you draw will have one of six colors. The probability of drawing each color depends on the

More information

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality arxiv:1801.00644v6 [stat.me] 3 Oct 2018 Reagan Mozer 1, Luke Miratrix 1, Aaron Russell

More information

Instructions for the ECN201 Project on Least-Cost Nutritionally-Adequate Diets

Instructions for the ECN201 Project on Least-Cost Nutritionally-Adequate Diets Instructions for the ECN201 Project on Least-Cost Nutritionally-Adequate Diets John P. Burkett October 15, 2015 1 Overview For this project, each student should (a) determine her or his nutritional needs,

More information

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection Esma Nur Cinicioglu * and Gülseren Büyükuğur Istanbul University, School of Business, Quantitative Methods

More information

Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification

Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification Reza Lotfian and Carlos Busso Multimodal Signal Processing (MSP) lab The University of Texas

More information

CS229 Final Project Report. Predicting Epitopes for MHC Molecules

CS229 Final Project Report. Predicting Epitopes for MHC Molecules CS229 Final Project Report Predicting Epitopes for MHC Molecules Xueheng Zhao, Shanshan Tuo Biomedical informatics program Stanford University Abstract Major Histocompatibility Complex (MHC) plays a key

More information

arxiv: v1 [cs.lg] 4 Feb 2019

arxiv: v1 [cs.lg] 4 Feb 2019 Machine Learning for Seizure Type Classification: Setting the benchmark Subhrajit Roy [000 0002 6072 5500], Umar Asif [0000 0001 5209 7084], Jianbin Tang [0000 0001 5440 0796], and Stefan Harrer [0000

More information

Your Health Matters. What You Need to Know about Adult Liver Transplantation. Access our patient education library online at

Your Health Matters. What You Need to Know about Adult Liver Transplantation. Access our patient education library online at Access our patient education library online at www.ucsfhealth.org Your Health Matters What You Need to Know about Adult Liver Transplantation Table of Contents 1. Introduction 2. The Preliminary Process

More information

Identifying Deviations from Usual Medical Care using a Statistical Approach

Identifying Deviations from Usual Medical Care using a Statistical Approach Identifying Deviations from Usual Medical Care using a Statistical Approach Shyam Visweswaran, MD, PhD 1, James Mezger, MD, MS 2, Gilles Clermont, MD, MSc 3, Milos Hauskrecht, PhD 4, Gregory F. Cooper,

More information

Consumer Review Analysis with Linear Regression

Consumer Review Analysis with Linear Regression Consumer Review Analysis with Linear Regression Cliff Engle Antonio Lupher February 27, 2012 1 Introduction Sentiment analysis aims to classify people s sentiments towards a particular subject based on

More information

Biology 42A Human Physiology Gerstenzang 124 M, W, Th 8:00am- 8:50am

Biology 42A Human Physiology Gerstenzang 124 M, W, Th 8:00am- 8:50am Biology 42A Human Physiology Gerstenzang 124 M, W, Th 8:00am- 8:50am Instructor Dr. Maria Miara, PhD mmiara@brandeis.edu Volen 208 (enter through 206) Office hours: Tuesdays 9:30-10:30, Fridays 10:00-11:00

More information

NGAL Connect to the kidneys

NGAL Connect to the kidneys NGAL Connect to the kidneys Acute kidney injury (AKI) An imposing medical and diagnostic challenge >13 million AKI patients each year ~ 30% with fatal outcome Cardiac surgery > 1 million patients/year

More information

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical

More information

Identifying Individuals Amenable to Drug Recovery Interventions through Computational Analysis of Addiction Content in Social Media

Identifying Individuals Amenable to Drug Recovery Interventions through Computational Analysis of Addiction Content in Social Media 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Identifying Individuals Amenable to Drug Recovery Interventions through Computational Analysis of Addiction Content in Social

More information

Critical Care Standard Infusion Concentrations

Critical Care Standard Infusion Concentrations Acetylcisteine (NAC) Actrapid - Human Insulin Addiphos 20mmol Addiphos 40mmol Adrenaline (Epinephrine) vs peripheral 10 g in 50mL 200mg per ml 50 units in 50mL 1 unit per ml sodium chloride 20 mmol in

More information

Determination of Delay in :flirn Around Time (TAT) of Stat Tests and its Causes: an AKUH Experience

Determination of Delay in :flirn Around Time (TAT) of Stat Tests and its Causes: an AKUH Experience Determination of Delay in :flirn Around Time (TAT) of Stat Tests and its Causes: an AKUH Experience F. Bilwani,I. Siddiqui,S. Vaqar ( Section of Chemical Pathology, Department of Pathology, Aga Khan University

More information

CRRT Fundamentals Pre-Test. AKI & CRRT 2017 Practice Based Learning in CRRT

CRRT Fundamentals Pre-Test. AKI & CRRT 2017 Practice Based Learning in CRRT CRRT Fundamentals Pre-Test AKI & CRRT 2017 Practice Based Learning in CRRT Question 1 A 72-year-old man with HTN presents to the ED with slurred speech, headache and weakness after falling at home. He

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

Chapter. Severe Acute Respiratory Syndrome (SARS) Outbreak in a University Hospital in Hong Kong. Epidemiology-University Hospital Experience

Chapter. Severe Acute Respiratory Syndrome (SARS) Outbreak in a University Hospital in Hong Kong. Epidemiology-University Hospital Experience content Chapter Severe Acute Respiratory Syndrome (SARS) Outbreak in a University Hospital in Hong Kong 3 Nelson Lee, Joseph JY Sung Epidemiology-University Hospital Experience Diagnosis of SARS Clinical

More information

extraction can take place. Another problem is that the treatment for chronic diseases is sequential based upon the progression of the disease.

extraction can take place. Another problem is that the treatment for chronic diseases is sequential based upon the progression of the disease. ix Preface The purpose of this text is to show how the investigation of healthcare databases can be used to examine physician decisions to develop evidence-based treatment guidelines that optimize patient

More information

COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks

COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks Textbook, Sections 6.2.2, 6.3, 7.9, 7.11-7.13, 9.1-9.5 COMP9444 17s2 Convolutional Networks 1 Outline Geometry of Hidden Unit Activations

More information

Predicting Kidney Cancer Survival from Genomic Data

Predicting Kidney Cancer Survival from Genomic Data Predicting Kidney Cancer Survival from Genomic Data Christopher Sauer, Rishi Bedi, Duc Nguyen, Benedikt Bünz Abstract Cancers are on par with heart disease as the leading cause for mortality in the United

More information

Summer A/C Semester, 2018

Summer A/C Semester, 2018 Credit: four (4) hours COURSE SYLLABUS BCH 4024: INTRODUCTION TO BIOCHEMISTRY AND MOLECULAR BIOLOGY COURSE COORDINATOR: Dr. William L. Zeile Summer A/C Semester, 2018 Course Description: BCH 4024 surveys

More information

Finish Strong. Learn It. Live It. A contest for use at chapter meetings. Materials:

Finish Strong. Learn It. Live It. A contest for use at chapter meetings. Materials: Learn It. Live It. A contest for use at chapter meetings Finish Strong By Rochelle Melander Materials: copies of this contest for each member Preparation: Decide ahead of time, as a chapter, on the entry

More information

Guidelines for Tracking Interventional Radiology Patient Care and Procedural Experiences Review Committee for Radiology

Guidelines for Tracking Interventional Radiology Patient Care and Procedural Experiences Review Committee for Radiology Guidelines for Tracking Interventional Radiology Patient Care and Procedural Experiences Review Committee for Radiology To comply with the Program Requirements for Graduate Medical Education in Interventional

More information

. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani

. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani Semi-automatic WordNet Linking using Word Embeddings Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani January 11, 2018 Kevin Patel WordNet Linking via Embeddings 1/22

More information

FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy. Siddharth Patwardhan. August 2003

FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy. Siddharth Patwardhan. August 2003 FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy by Siddharth Patwardhan August 2003 A report describing the research work carried out at the Mayo Clinic in Rochester as part of an

More information

Prof. Edward Aboufadel Department of Mathematics Sabbatical Report for Fall 2011 Sabbatical January 2012

Prof. Edward Aboufadel Department of Mathematics Sabbatical Report for Fall 2011 Sabbatical January 2012 Prof. Edward Aboufadel Department of Mathematics Sabbatical Report for Fall 2011 Sabbatical January 2012 I wish to thank Grand Valley for providing me the time to pursue a research project in the area

More information

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation 1,2,3 EMR and Intelligent Expert System Engineering Research Center of

More information

The Good News. More storage capacity allows information to be saved Economic and social forces creating more aggregation of data

The Good News. More storage capacity allows information to be saved Economic and social forces creating more aggregation of data The Good News Capacity to gather medically significant data growing quickly Better instrumentation (e.g., MRI machines, ambulatory monitors, cameras) generates more information/patient More storage capacity

More information

Title: Co-morbidities, complications and causes of death among people with femoral neck fracture - A three-year follow-up study.

Title: Co-morbidities, complications and causes of death among people with femoral neck fracture - A three-year follow-up study. Author s response to reviews Title: Co-morbidities, complications and causes of death among people with femoral neck fracture - A three-year follow-up study. Authors: Monica Berggren (monica.langstrom@umu.se)

More information

Mahoning County Public Health. Epidemiology Response Annex

Mahoning County Public Health. Epidemiology Response Annex Mahoning County Public Health Epidemiology Response Annex Created: May 2006 Updated: February 2015 Mahoning County Public Health Epidemiology Response Annex Table of Contents Epidemiology Response Document

More information

COGS 1: FALL Section E

COGS 1: FALL Section E COGS 1: FALL 2018 Section E Professor Boyle mboyle@ucsd.edu Monday, 2-3:50pm CSB 130 Zoe tzcheng@ucsd.edu Monday, 12-12:50pm CSB 223 Lauren lcurley@ucsd.edu Monday, 2:-250pm CSB 225 Subathra suraj@ucsd.edu

More information

Thesaurus-based value maps as an instrument for psychological research

Thesaurus-based value maps as an instrument for psychological research Thesaurus-based value maps as an instrument for psychological research Markus Christen, University of Zurich & University of 3/25/2013 Page 1 Table of Contents The context of the project The idea of mapping

More information

Click on the Case Entry tab and the Procedure Menu will display. To add new procedures, click on Add.

Click on the Case Entry tab and the Procedure Menu will display. To add new procedures, click on Add. Click on the Case Entry tab and the Procedure Menu will display. To add new procedures, click on Add. After you click on the Add link, the Procedure Entry page will display. If you are a resident your

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information

Take-Home Final Exam: Mining Regulatory Modules from Gene Expression Data

Take-Home Final Exam: Mining Regulatory Modules from Gene Expression Data Take-Home Final Exam: Mining Regulatory Modules from Gene Expression Data 36-350, Data Mining; Fall 2009 Due at 5 pm on Tuesday, 15 December 2009 There are three problems, each with several parts. All

More information

Gray level cooccurrence histograms via learning vector quantization

Gray level cooccurrence histograms via learning vector quantization Gray level cooccurrence histograms via learning vector quantization Timo Ojala, Matti Pietikäinen and Juha Kyllönen Machine Vision and Media Processing Group, Infotech Oulu and Department of Electrical

More information

LAB TIME/DATE. 1. most numerous leukocyte. 3. also called an erythrocyte; anucleate formed element. 6. ancestral cell of platelets

LAB TIME/DATE. 1. most numerous leukocyte. 3. also called an erythrocyte; anucleate formed element. 6. ancestral cell of platelets ighapmlre29apg245_250 5/12/04 2:46 PM Page 245 impos03 302:bjighapmL:ighapmLrevshts:layouts: NAME Blood LAB TIME/DATE REVIEW SHEET exercise 29A Composition of Blood 1. What is the blood volume of an average-size

More information

AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES RISK IN PRIMARY CARE

AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES RISK IN PRIMARY CARE Proceedings of the 3rd INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2008) J. Li, D. Aleman, R. Sikora, eds. AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES

More information

Your Diabetes Care Records

Your Diabetes Care Records Your Diabetes Care Records Make copies of the charts in this section. These charts list important things you should discuss with your doctor at each visit. Things to Discuss with Your Health Care Team

More information

Answers to end of chapter questions

Answers to end of chapter questions Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are

More information

UvA-DARE (Digital Academic Repository)

UvA-DARE (Digital Academic Repository) UvA-DARE (Digital Academic Repository) Superinfection with drug-resistant HIV is rare and does not contribute substantially to therapy failure in a large European cohort Bartha, I.; Assel, M.; Sloot, P.M.A.;

More information

Exploiting Ordinality in Predicting Star Reviews

Exploiting Ordinality in Predicting Star Reviews Exploiting Ordinality in Predicting Star Reviews Alim Virani UBC - Computer Science alim.virani@gmail.com Chris Cameron UBC - Computer Science cchris13@cs.ubc.ca Abstract Automatically evaluating the sentiment

More information

Mature microrna identification via the use of a Naive Bayes classifier

Mature microrna identification via the use of a Naive Bayes classifier Mature microrna identification via the use of a Naive Bayes classifier Master Thesis Gkirtzou Katerina Computer Science Department University of Crete 13/03/2009 Gkirtzou K. (CSD UOC) Mature microrna identification

More information

Measurement and meaningfulness in Decision Modeling

Measurement and meaningfulness in Decision Modeling Measurement and meaningfulness in Decision Modeling Brice Mayag University Paris Dauphine LAMSADE FRANCE Chapter 2 Brice Mayag (LAMSADE) Measurement theory and meaningfulness Chapter 2 1 / 47 Outline 1

More information

Automatic Lung Cancer Detection Using Volumetric CT Imaging Features

Automatic Lung Cancer Detection Using Volumetric CT Imaging Features Automatic Lung Cancer Detection Using Volumetric CT Imaging Features A Research Project Report Submitted To Computer Science Department Brown University By Dronika Solanki (B01159827) Abstract Lung cancer

More information

Quality Improvement of Causes of Death Statistics by Automated Coding in Estonia, 2011

Quality Improvement of Causes of Death Statistics by Automated Coding in Estonia, 2011 Quality Improvement of Causes of Death Statistics by Automated Coding in Estonia, 2011 Technical Implementation report, Grant agreement nr 10501.2009.002-2009.461 Introduction The grant agreement between

More information

Brescia University College POLICIES and PROCEDURES

Brescia University College POLICIES and PROCEDURES Brescia University College POLICIES and PROCEDURES Policy Title: Policy on Alcohol Classification: General Issued by: Administration Approved by: Council of Trustees Effective Date: April 22, 2008 PURPOSE

More information

Inter-session reproducibility measures for high-throughput data sources

Inter-session reproducibility measures for high-throughput data sources Inter-session reproducibility measures for high-throughput data sources Milos Hauskrecht, PhD, Richard Pelikan, MSc Computer Science Department, Intelligent Systems Program, Department of Biomedical Informatics,

More information

IMPaLA tutorial.

IMPaLA tutorial. IMPaLA tutorial http://impala.molgen.mpg.de/ 1. Introduction IMPaLA is a web tool, developed for integrated pathway analysis of metabolomics data alongside gene expression or protein abundance data. It

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and

More information

An Overview of Human Error

An Overview of Human Error An Overview of Human Error Drawn from J. Reason, Human Error, Cambridge, 1990 Aaron Brown CS 294-4 ROC Seminar Outline Human error and computer system failures A theory of human error Human error and accident

More information

Computer Science 101 Project 2: Predator Prey Model

Computer Science 101 Project 2: Predator Prey Model Computer Science 101 Project 2: Predator Prey Model Real-life situations usually are complicated and difficult to model exactly because of the large number of variables present in real systems. Computer

More information

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Riccardo Miotto and Chunhua Weng Department of Biomedical Informatics Columbia University,

More information

Advice to patients having an angioplasty

Advice to patients having an angioplasty What is an angioplasty? Advice to patients having an angioplasty An angioplasty is an x ray procedure to open a narrowed or blocked artery in order to improve blood flow. It involves inserting a long tube

More information

Lesson 2: Describing the Center of a Distribution

Lesson 2: Describing the Center of a Distribution In previous work with data distributions, you learned how to derive the mean and the median of a data distribution. This lesson builds on your previous work with a center. Exploratory Challenge You will

More information

Brendan O Connor,

Brendan O Connor, Some slides on Paul and Dredze, 2012. Discovering Health Topics in Social Media Using Topic Models. PLOS ONE. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0103408 Brendan O Connor,

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

Error Detection based on neural signals

Error Detection based on neural signals Error Detection based on neural signals Nir Even- Chen and Igor Berman, Electrical Engineering, Stanford Introduction Brain computer interface (BCI) is a direct communication pathway between the brain

More information

Christopher Cairns and Elizabeth Plantan. October 9, 2016

Christopher Cairns and Elizabeth Plantan. October 9, 2016 Online appendices to Why autocrats sometimes relax online censorship of sensitive issues: A case study of microblog discussion of air pollution in China Christopher Cairns and Elizabeth Plantan October

More information

Part 1: Bag-of-words models. by Li Fei-Fei (Princeton)

Part 1: Bag-of-words models. by Li Fei-Fei (Princeton) Part 1: Bag-of-words models by Li Fei-Fei (Princeton) Object Bag of words Analogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our

More information