A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

Size: px
Start display at page:

Download "A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U"

Transcription

1 A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U T H E U N I V E R S I T Y O F T E X A S A T D A L L A S H U M A N L A N G U A G E T E C H N O L O G Y R E S E A R C H I N S T I T U T E H T T P : / / W W W. H L T. U T D A L L A S. E D U

2 Presentation Outline 1. The Problem Purpose Background 2. The Dataset The corpus Mathematical representation 3. The Approach Simple model Bayesian model Inference 4. Results Experiments Conclusions

3 The Problem: Motivation personalized medicine has the potential to [improve] patient care and disease prevention [... and] to positively impact two other important trends the increasing cost of health care and the decreasing rate of new medical product development. The ability to distinguish in advance those patients who will benefit from a given treatment and those who are likely to suffer important adverse effects could result in meaningful cost savings for the overall health care system. Moreover, the ability to stratify patients by disease susceptibility or likely response to treatment could also reduce the size, duration, and cost of clinical trials, thus facilitating the development of new treatments, diagnostics, and prevention strategies. - The President s Council of Advisers on Science and Technology

4 The Problem: EHRs There are an estimated million emergency department visits each year in the United States. 12% (16.4 million) result in hospital admissions average hospital stay of 4.8 days An electronic medical record (EMR) is an individual medical report which documents a variety of clinical observations, such as the patient s diagnoses, risk factors, medications, and test results The electronic health record (EHR) for an individual combines all the EMRs generated during the patient s clinical chronology EHRs document clinical observations made at different times throughout the health management of a patient. However, the clinical course of a disease continues to progress between the times when a physician examines the patient and updates the patients EHR.

5 The Problem: EHR Goals The United States government has outlined four major goals for widespread EHR adoption: 1. Track data over time 2. Identify patients who are due for preventive visits and screenings 3. Monitor how patients measure up to certain parameters, such as vaccinations and blood pressure readings 4. Improve overall quality of care in a practice In this presentation (and the associated paper), we show how each of those goals can be addressed defining a novel probabilistic model of patients clinical chronologies

6 Presentation Outline 1. The Problem Purpose Background 2. The Dataset The corpus Mathematical representation 3. The Approach Simple model Bayesian model Inference 4. Results Experiments Conclusions

7 The Dataset We considered a collection of 790 de-identified, longitudinal narrative electronic medical records (EMRs). This collection was provided by the organizers of the shared-tasks on Challenges in Language Processing for Clinical Data sponsored by the 2014 Informatics for Integrating Biology and the Beside (i2b2) and the University of Texas Health Science Center at Houston (UTHealth). The EMRs in this collection document the progression of heart disease for 296 diabetic patients, providing between three to five EMRs for each patient. Each EMR is associated with: 1. a patient identifier which uniquely identifies the patient associated with the EMR, 2. a de-identified creation date indicating the approximate creation time of the EMR, & 3. a large body of narrative text.

8 The Dataset: Annotations Each EMR contains manual annotations conducted by clinical experts These manual annotations explicitly document the presence of certain clinical findings and medications relevant to heart disease, i.e.: Diseases (such as CORONARY ARTERY DISEASE(CAD), DIABETES, or OBESITY) Risk factors (such as HYPERTENSION, HYPERLIPIDEMIA) Medications (such as asprin) Medication Types (such as calcium channel blockers) Each finding or medication was annotated with a temporal signal: BEFORE: the finding (or medication) was present at the creation-time of the EMR AFTER: the finding (or medication) was present only after the creation-time of the EMR DURING: the finding (or medication) was present through the entire duration of the EMR We considered only the clinical findings and medications which were observed as present or during these two temporal signals encompassed 89% of all observations instead, we directly encoded the elapsed time between successive EMRs

9 The Dataset: Findings

10 The Dataset: Medications

11 Presentation Outline 1. The Problem Purpose Background 2. The Dataset The corpus Mathematical representation 3. The Approach Simple model Bayesian model Inference 4. Results Experiments Conclusions

12 The Approach In order to automatically predict the way a patient s clinical observations might progress based on their medical history, we define a probabilistic temporal prediction model. Step 1: discover latent trends in the way clinical observations progressed in a provided collection of patient histories Step 2: apply these latent trends to the chronology of a new patient in order to predict how his or her clinical findings might progress. IDEA: when discovering trends, or making predictions, we would prefer to only consider the clinical histories of similar patients SOLUTION: learn latent groups, or clusters of patients based on the trends in the data!

13 The Approach Given a collection of longitudinal EMRs, we define the following parameters: N = the number of patients in our dataset (i.e., 128) L n = the number of EMRs associated with patient n in our dataset (i.e., between 3 to 5) V = the number of clinical observations we are modelling (i.e., 5 clinical findings + 22 medications = 27 clinical observations) K = the number of latent groups, or clusters, to learn from the dataset The clinical chronology of all patients in a dataset can be represented with 2 mathematical structures: O = O n,v,t 0,1 M V L n E = E n,t R M L n Where: n 1.. N denotes the patient v 1.. V denotes the clinical finding t 1.. L n denotes the index in the chronologically ordered EMR sequence Such that: O n,v,t is a binary 3 rd -order tensor indicating whether the v-th observation was present during the t-th EMR in patient n s clinical chronology E n,t is a real-valued matrix indicating the elapsed time (in days) between the t-th EMR and the previous t 1 -th EMR and (E n,0 = 0)

14 The Approach

15 The Approach: Simple Model Defining a Probabilistic Graphical Model (PGM): A set of statistical random variables A set of statistical dependencies or independencies between these variables We define the following statistical random variables: A binary variable for each entry in O n,v,t A continuous variable for each entry in E n,t A discrete variable z n indicating which of the 1.. K latent groups patient n is assigned

16 The Approach: Simple Model

17 The Approach: Simple Model We represent the chronological influences between clinical observations in successive EMRs using the following quantities: F trans u, v, z = the number of patients in group z whose clinical chronology included observation v immediately following observation u F base v, z = the number of patients in group z whose clinical chronology included observation v. F group z = the number of patients in group z This allows to represent three statistical influences or dependencies: The transition probability of an observation u being present given the presence (or absence) of observation v in the previous EMR for a patient in group z: P trans u v, z = F trans(u, v, z) F base v, z The base probability of an observation v being present for a patient in group z: P base v z = F base v, z F group z The temporal probability of an observation v being observed after an elapsed time x for patients in group z: P temp v x Exponential x; λ v = λ v e λ vx

18 The Approach This model operates according to the so-called closed-world assumption: the clinical chronologies in our dataset constitute all the possible clinical chronologies that may ever occur Clearly, this assumption is not always true. Thus, we relax this assumption by introducing a number of prior distributions over the variables in our model and assume that the clinical histories in our dataset were generated according to these prior distributions. O n,v,t ~ Binomial ψ v,k E n,t ~Exponential λ v,k z Multinomial θ Then, we can encode prior knowledge about these distributions using second-order prior distributions: ψ v,k Beta α v, β v λ v,k Gamma γ v, δ v θ Dirichlet η

19 The Approach: Bayesian Model

20 The Approach: Inference In order to learn the trends from our dataset, we need to find the values of the latent variables in our model: λ v,k θ ψ v,k z n To do this, we used collapsed Gibb s sampling.

21 The Approach: Inference Predicting new patient outcomes: 1. Encode the patient s history using statistical random variables so that we can leverage our probabilistic model: O v,t = binary matrix indicating which clinical findings were present in each of the patient s EMRs E t = continuous vector of the elapsed time between the patient s EMRs 2. Use our model to assign a latent group to the patient based on his or her medical history: z Ƹ = argmax P z O, E z 3. Use the transition, temporal, and base probabilities associated with that latent group to predict the presence (1) or absence (0) of each clinical finding (v) : w = argmax w 0,1 P base v = w z n V P trans v = w u, z n u=1

22 Presentation Outline 1. The Problem Purpose Background 2. The Dataset The corpus Mathematical representation 3. The Approach Simple model Bayesian model Inference 4. Results Experiments Conclusions

23 Results: Experiments In our experiments we used the official train/test split used in the 2014 i2b2/uthealth dataset for evaluating risk factor identification: Training set: 790 EMRs for 178 patients Testing set: 514 EMRs for 118 patients We attempted to predict the set of observations in the last EMR for each patient, given all the previous EMRs for that patient. Note: we also performed leave-one-out cross validation; there were no statistically significant differences

24 Results: Experiments For each patient we considered each observation as: true positive (TP) if it was predicted by the model and mentioned in the EMR false positive (FP) if it was predicted by the model but not mentioned in the EMR false negative (FN) if it was not predicted by the model but was mentioned in the EMR true negative(tn) if it was not predicted by the model and was not mentioned in the EMR We considered a variety of performance measures: Accuracy (Acc.,TP+TNTP+FP+FN+TN); Positive Predictive Value (PPV, also known as Precision, TPTP+FP); False Negative Rate (FNR,a so known as the miss rate, FNFN+TP); False Positive Rate (FPR, also known as the fall-out, FPFP+TN); True Negative Rate (TNR, also known as Specificty, TNFP+TN) ; True Positive Rate (TPR, also known as the hit rate or Recall, TPTP+FN); F 1 -Measure (2TP2TP+FP+FN)

25 Results: Number of Patient Groups

26 Results: Individual Observations

27 Results: Conclusions In this presentation (and the associated paper), we presented a novel method for constructing a data-driven probabilistic graphical model of patients clinical chronologies. We have shown how this model can be used to 1. Infer latent groups of similar patients from a dataset 2. Discover trends in how clinical observations evolve over time from a dataset 3. Assign new patients to the most similar group in a dataset 4. Predict the most likely progression of clinical findings for a patient The model we presented does not depend on any a priori knowledge about any particular clinical findings, and instead discovers trends based on latent statistical information. We have shown that this model yields promising performance for predicting risk factors of heart disease in a dataset of diabetic patients.

28 Questions?

A Probabilistic Reasoning Method for Predicting the Progression of Clinical Findings from Electronic Medical Records

A Probabilistic Reasoning Method for Predicting the Progression of Clinical Findings from Electronic Medical Records A Probabilistic Reasoning Method for Predicting the Progression of Clinical Findings from Electronic Medical Records Travis Goodwin, Sanda M. Harabagiu, PhD University of Texas at Dallas, Richardson, TX,

More information

Inferring Clinical Correlations from EEG Reports with Deep Neural Learning

Inferring Clinical Correlations from EEG Reports with Deep Neural Learning Inferring Clinical Correlations from EEG Reports with Deep Neural Learning Methods for Identification, Classification, and Association using EHR Data S23 Travis R. Goodwin (Presenter) & Sanda M. Harabagiu

More information

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

INTRODUCTION TO MACHINE LEARNING. Decision tree learning INTRODUCTION TO MACHINE LEARNING Decision tree learning Task of classification Automatically assign class to observations with features Observation: vector of features, with a class Automatically assign

More information

Multi-modal Patient Cohort Identification from EEG Report and Signal Data

Multi-modal Patient Cohort Identification from EEG Report and Signal Data Multi-modal Patient Cohort Identification from EEG Report and Signal Data Travis R. Goodwin and Sanda M. Harabagiu The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu

More information

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor Text mining for lung cancer cases over large patient admission data David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor Opportunities for Biomedical Informatics Increasing roll-out

More information

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Ramon Maldonado, BS, Travis Goodwin, PhD Sanda M. Harabagiu, PhD The University

More information

Statement of research interest

Statement of research interest Statement of research interest Milos Hauskrecht My primary field of research interest is Artificial Intelligence (AI). Within AI, I am interested in problems related to probabilistic modeling, machine

More information

Evaluation of diagnostic tests

Evaluation of diagnostic tests Evaluation of diagnostic tests Biostatistics and informatics Miklós Kellermayer Overlapping distributions Assumption: A classifier value (e.g., diagnostic parameter, a measurable quantity, e.g., serum

More information

Worksheet for Structured Review of Physical Exam or Diagnostic Test Study

Worksheet for Structured Review of Physical Exam or Diagnostic Test Study Worksheet for Structured Review of Physical Exam or Diagnostic Study Title of Manuscript: Authors of Manuscript: Journal and Citation: Identify and State the Hypothesis Primary Hypothesis: Secondary Hypothesis:

More information

Statistical Models for Censored Point Processes with Cure Rates

Statistical Models for Censored Point Processes with Cure Rates Statistical Models for Censored Point Processes with Cure Rates Jennifer Rogers MSD Seminar 2 November 2011 Outline Background and MESS Epilepsy MESS Exploratory Analysis Summary Statistics and Kaplan-Meier

More information

Applying Data Mining for Epileptic Seizure Detection

Applying Data Mining for Epileptic Seizure Detection Applying Data Mining for Epileptic Seizure Detection Ying-Fang Lai 1 and Hsiu-Sen Chiang 2* 1 Department of Industrial Education, National Taiwan Normal University 162, Heping East Road Sec 1, Taipei,

More information

Various performance measures in Binary classification An Overview of ROC study

Various performance measures in Binary classification An Overview of ROC study Various performance measures in Binary classification An Overview of ROC study Suresh Babu. Nellore Department of Statistics, S.V. University, Tirupati, India E-mail: sureshbabu.nellore@gmail.com Abstract

More information

3. Model evaluation & selection

3. Model evaluation & selection Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

More information

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation 1,2,3 EMR and Intelligent Expert System Engineering Research Center of

More information

Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms

Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms Natalia Viani, PhD IoPPN, King s College London Introduction: clinical use-case For patients with schizophrenia, longer durations

More information

An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework

An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework Soumya GHOSE, Jhimli MITRA 1, Sankalp KHANNA 1 and Jason DOWLING 1 1. The Australian e-health and

More information

CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL

CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL 127 CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL 6.1 INTRODUCTION Analyzing the human behavior in video sequences is an active field of research for the past few years. The vital applications of this field

More information

Screening (Diagnostic Tests) Shaker Salarilak

Screening (Diagnostic Tests) Shaker Salarilak Screening (Diagnostic Tests) Shaker Salarilak Outline Screening basics Evaluation of screening programs Where we are? Definition of screening? Whether it is always beneficial? Types of bias in screening?

More information

Probabilistic retrieval and visualization of relevant experiments

Probabilistic retrieval and visualization of relevant experiments Probabilistic retrieval and visualization of relevant experiments Samuel Kaski Joint work with: José Caldas, Nils Gehlenborg, Ali Faisal, Alvis Brazma Motivation 2 How to best use collections of measurement

More information

Recent trends in health care legislation have led to a rise in

Recent trends in health care legislation have led to a rise in Collaborative Filtering for Medical Conditions By Shea Parkes and Ben Copeland Recent trends in health care legislation have led to a rise in risk-bearing health care provider organizations, such as accountable

More information

Prediction of Diabetes Using Probability Approach

Prediction of Diabetes Using Probability Approach Prediction of Diabetes Using Probability Approach T.monika Singh, Rajashekar shastry T. monika Singh M.Tech Dept. of Computer Science and Engineering, Stanley College of Engineering and Technology for

More information

Comparing disease screening tests when true disease status is ascertained only for screen positives

Comparing disease screening tests when true disease status is ascertained only for screen positives Biostatistics (2001), 2, 3,pp. 249 260 Printed in Great Britain Comparing disease screening tests when true disease status is ascertained only for screen positives MARGARET SULLIVAN PEPE, TODD A. ALONZO

More information

Predictive Diagnosis. Clustering to Better Predict Heart Attacks x The Analytics Edge

Predictive Diagnosis. Clustering to Better Predict Heart Attacks x The Analytics Edge Predictive Diagnosis Clustering to Better Predict Heart Attacks 15.071x The Analytics Edge Heart Attacks Heart attack is a common complication of coronary heart disease resulting from the interruption

More information

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical

More information

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Sylvia Richardson 1 sylvia.richardson@imperial.co.uk Joint work with: Alexina Mason 1, Lawrence

More information

Bayesian meta-analysis of Papanicolaou smear accuracy

Bayesian meta-analysis of Papanicolaou smear accuracy Gynecologic Oncology 107 (2007) S133 S137 www.elsevier.com/locate/ygyno Bayesian meta-analysis of Papanicolaou smear accuracy Xiuyu Cong a, Dennis D. Cox b, Scott B. Cantor c, a Biometrics and Data Management,

More information

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD School of Biomedical Informatics The University of Texas Health Science Center at Houston 1 Advancing Cancer Pharmacoepidemiology

More information

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

More information

4. Model evaluation & selection

4. Model evaluation & selection Foundations of Machine Learning CentraleSupélec Fall 2017 4. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

More information

May All Your Wishes Come True: A Study of Wishes and How to Recognize Them

May All Your Wishes Come True: A Study of Wishes and How to Recognize Them May All Your Wishes Come True: A Study of Wishes and How to Recognize Them Andrew B. Goldberg, Nathanael Fillmore, David Andrzejewski, Zhiting Xu, Bryan Gibson & Xiaojin Zhu Computer Sciences Department

More information

Discovering Symptom-herb Relationship by Exploiting SHT Topic Model

Discovering Symptom-herb Relationship by Exploiting SHT Topic Model [DOI: 10.2197/ipsjtbio.10.16] Original Paper Discovering Symptom-herb Relationship by Exploiting SHT Topic Model Lidong Wang 1,a) Keyong Hu 1 Xiaodong Xu 2 Received: July 7, 2017, Accepted: August 29,

More information

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS Semantic Alignment between ICD-11 and SNOMED-CT By Marcie Wright RHIA, CHDA, CCS World Health Organization (WHO) owns and publishes the International Classification of Diseases (ICD) WHO was entrusted

More information

Statistical modeling for prospective surveillance: paradigm, approach, and methods

Statistical modeling for prospective surveillance: paradigm, approach, and methods Statistical modeling for prospective surveillance: paradigm, approach, and methods Al Ozonoff, Paola Sebastiani Boston University School of Public Health Department of Biostatistics aozonoff@bu.edu 3/20/06

More information

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition Stefan Mathe, Cristian Sminchisescu Presented by Mit Shah Motivation Current Computer Vision Annotations subjectively

More information

SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models

SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models Zafar et al. Genome Biology (2017) 18:178 DOI 10.1186/s13059-017-1311-2 METHOD Open Access SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models Hamim Zafar 1,2, Anthony

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

CSE 255 Assignment 9

CSE 255 Assignment 9 CSE 255 Assignment 9 Alexander Asplund, William Fedus September 25, 2015 1 Introduction In this paper we train a logistic regression function for two forms of link prediction among a set of 244 suspected

More information

Bayesian (Belief) Network Models,

Bayesian (Belief) Network Models, Bayesian (Belief) Network Models, 2/10/03 & 2/12/03 Outline of This Lecture 1. Overview of the model 2. Bayes Probability and Rules of Inference Conditional Probabilities Priors and posteriors Joint distributions

More information

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Ronghui (Lily) Xu Division of Biostatistics and Bioinformatics Department of Family Medicine

More information

STATISTICAL METHODS FOR THE EVALUATION OF A CANCER SCREENING PROGRAM

STATISTICAL METHODS FOR THE EVALUATION OF A CANCER SCREENING PROGRAM STATISTICAL METHODS FOR THE EVALUATION OF A CANCER SCREENING PROGRAM STATISTICAL METHODS FOR THE EVALUATION OF A CANCER SCREENING PROGRAM BY HUAN JIANG, M.Sc. a thesis submitted to the department of Clinical

More information

The Perceptron: : A Probabilistic Model for Information Storage and Organization in the brain (F. Rosenblatt)

The Perceptron: : A Probabilistic Model for Information Storage and Organization in the brain (F. Rosenblatt) The Perceptron: : A Probabilistic Model for Information Storage and Organization in the brain (F. Rosenblatt) Artificial Intelligence 2005-21534 Heo, Min-Oh Outline Introduction Probabilistic model on

More information

Rating prediction on Amazon Fine Foods Reviews

Rating prediction on Amazon Fine Foods Reviews Rating prediction on Amazon Fine Foods Reviews Chen Zheng University of California,San Diego chz022@ucsd.edu Ye Zhang University of California,San Diego yez033@ucsd.edu Yikun Huang University of California,San

More information

Inference Methods for First Few Hundred Studies

Inference Methods for First Few Hundred Studies Inference Methods for First Few Hundred Studies James Nicholas Walker Thesis submitted for the degree of Master of Philosophy in Applied Mathematics and Statistics at The University of Adelaide (Faculty

More information

Lecture 11: Clustering to discover disease subtypes and stages

Lecture 11: Clustering to discover disease subtypes and stages MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 11: Clustering to discover disease subtypes and stages Prof. David Sontag MIT EECS, CSAIL, IMES Outline of today s class 1. Overview of clustering

More information

Schema-Driven Relationship Extraction from Unstructured Text

Schema-Driven Relationship Extraction from Unstructured Text Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 2007 Schema-Driven Relationship Extraction from Unstructured Text Cartic

More information

FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

FUNNEL: Automatic Mining of Spatially Coevolving Epidemics FUNNEL: Automatic Mining of Spatially Coevolving Epidemics By Yasuo Matsubara, Yasushi Sakurai, Willem G. van Panhuis, and Christos Faloutsos SIGKDD 2014 Presented by Sarunya Pumma This presentation has

More information

Module Overview. What is a Marker? Part 1 Overview

Module Overview. What is a Marker? Part 1 Overview SISCR Module 7 Part I: Introduction Basic Concepts for Binary Classification Tools and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

More information

Neuro-Inspired Statistical. Rensselaer Polytechnic Institute National Science Foundation

Neuro-Inspired Statistical. Rensselaer Polytechnic Institute National Science Foundation Neuro-Inspired Statistical Pi Prior Model lfor Robust Visual Inference Qiang Ji Rensselaer Polytechnic Institute National Science Foundation 1 Status of Computer Vision CV has been an active area for over

More information

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. XX, NO. X, XXXX

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. XX, NO. X, XXXX IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. XX, NO. X, XXXX 2017 1 Personalized Risk Scoring for Critical Care Prognosis using Mixtures of Gaussian Processes Ahmed M. Alaa, Member, IEEE, Jinsung

More information

Comparing Decision Support Methodologies for Identifying Asthma Exacerbations

Comparing Decision Support Methodologies for Identifying Asthma Exacerbations MEDINFO 2007 K. Kuhn et al. (Eds) IOS Press, 2007 2007 The authors. All rights reserved. Comparing Decision Support Methodologies for Identifying Asthma Exacerbations Judith W Dexheimer a, Laura E Brown

More information

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials Practical Bayesian Design and Analysis for Drug and Device Clinical Trials p. 1/2 Practical Bayesian Design and Analysis for Drug and Device Clinical Trials Brian P. Hobbs Plan B Advisor: Bradley P. Carlin

More information

Understanding Temporal Patterns in Hypertensive Drug Therapy

Understanding Temporal Patterns in Hypertensive Drug Therapy Understanding Temporal Patterns in Hypertensive Drug Therapy 1 Margret Bjarnadottir, 2 Sana Malik, 2 Catherine Plaisant, 3 Eberechukwu Onukwugha 1 Smith School of Business, University of Maryland, College

More information

AN EFFICIENT CORONARY HEART DISEASE PREDICTION BY SEMI PARAMETRIC EXTENDED DYNAMIC BAYESIAN NETWORK WITH OPTIMIZED CUT POINTS

AN EFFICIENT CORONARY HEART DISEASE PREDICTION BY SEMI PARAMETRIC EXTENDED DYNAMIC BAYESIAN NETWORK WITH OPTIMIZED CUT POINTS AN EFFICIENT CORONARY HEART DISEASE PREDICTION BY SEMI PARAMETRIC EXTENDED DYNAMIC BAYESIAN NETWORK WITH OPTIMIZED CUT POINTS K. Gomathi 1 and D. Shanmuga Priyaa 2 1 Department of Computer Science, Karpagam

More information

Introduction to Bayesian Analysis 1

Introduction to Bayesian Analysis 1 Biostats VHM 801/802 Courses Fall 2005, Atlantic Veterinary College, PEI Henrik Stryhn Introduction to Bayesian Analysis 1 Little known outside the statistical science, there exist two different approaches

More information

Bayesian Latent Subgroup Design for Basket Trials

Bayesian Latent Subgroup Design for Basket Trials Bayesian Latent Subgroup Design for Basket Trials Yiyi Chu Department of Biostatistics The University of Texas School of Public Health July 30, 2017 Outline Introduction Bayesian latent subgroup (BLAST)

More information

Prediction of Diabetes Using Bayesian Network

Prediction of Diabetes Using Bayesian Network Prediction of Diabetes Using Bayesian Network Mukesh kumari 1, Dr. Rajan Vohra 2,Anshul arora 3 1,3 Student of M.Tech (C.E) 2 Head of Department Department of computer science & engineering P.D.M College

More information

Unsupervised Pattern Discovery in Sparsely Sampled Clinical Time Series

Unsupervised Pattern Discovery in Sparsely Sampled Clinical Time Series Unsupervised Pattern Discovery in Sparsely Sampled Clinical Time Series David Kale Virtual PICU Children s Hospital LA dkale@chla.usc.edu Benjamin M. Marlin Department of Computer Science University of

More information

Detecting and monitoring foodborne illness outbreaks: Twitter communications and the 2015 U.S. Salmonella outbreak linked to imported cucumbers

Detecting and monitoring foodborne illness outbreaks: Twitter communications and the 2015 U.S. Salmonella outbreak linked to imported cucumbers Detecting and monitoring foodborne illness outbreaks: Twitter communications and the 2015 U.S. Salmonella outbreak linked to imported cucumbers Abstract This research uses Twitter, as a social media device,

More information

Primary Level Classification of Brain Tumor using PCA and PNN

Primary Level Classification of Brain Tumor using PCA and PNN Primary Level Classification of Brain Tumor using PCA and PNN Dr. Mrs. K.V.Kulhalli Department of Information Technology, D.Y.Patil Coll. of Engg. And Tech. Kolhapur,Maharashtra,India kvkulhalli@gmail.com

More information

L2, Important properties of epidemics and endemic situations

L2, Important properties of epidemics and endemic situations L2, Important properties of epidemics and endemic situations July, 2016 The basic reproduction number Recall: R 0 = expected number individuals a typical infected person infects when everyone is susceptible

More information

IE 5203 Decision Analysis Lab I Probabilistic Modeling, Inference and Decision Making with Netica

IE 5203 Decision Analysis Lab I Probabilistic Modeling, Inference and Decision Making with Netica IE 5203 Decision Analysis Lab I Probabilistic Modeling, Inference and Decision Making with Netica Overview of Netica Software Netica Application is a comprehensive tool for working with Bayesian networks

More information

Bayesian hierarchical modelling

Bayesian hierarchical modelling Bayesian hierarchical modelling Matthew Schofield Department of Mathematics and Statistics, University of Otago Bayesian hierarchical modelling Slide 1 What is a statistical model? A statistical model:

More information

Building Evaluation Scales for NLP using Item Response Theory

Building Evaluation Scales for NLP using Item Response Theory Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged

More information

GATE CAT Diagnostic Test Accuracy Studies

GATE CAT Diagnostic Test Accuracy Studies GATE: a Graphic Approach To Evidence based practice updates from previous version in red Critically Appraised Topic (CAT): Applying the 5 steps of Evidence Based Practice Using evidence from Assessed by:

More information

Jonathan D. Sugimoto, PhD Lecture Website:

Jonathan D. Sugimoto, PhD Lecture Website: Jonathan D. Sugimoto, PhD jons@fredhutch.org Lecture Website: http://www.cidid.org/transtat/ 1 Introduction to TranStat Lecture 6: Outline Case study: Pandemic influenza A(H1N1) 2009 outbreak in Western

More information

Protein Structure & Function. University, Indianapolis, USA 3 Department of Molecular Medicine, University of South Florida, Tampa, USA

Protein Structure & Function. University, Indianapolis, USA 3 Department of Molecular Medicine, University of South Florida, Tampa, USA Protein Structure & Function Supplement for article entitled MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in

More information

GIANT: Geo-Informative Attributes for Location Recognition and Exploration

GIANT: Geo-Informative Attributes for Location Recognition and Exploration GIANT: Geo-Informative Attributes for Location Recognition and Exploration Quan Fang, Jitao Sang, Changsheng Xu Institute of Automation, Chinese Academy of Sciences October 23, 2013 Where is this? La Sagrada

More information

Action Recognition. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem

Action Recognition. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem Action Recognition Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem This section: advanced topics Convolutional neural networks in vision Action recognition Vision and Language 3D

More information

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition

More information

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts jsci2016 Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Wutthipong Kongburan, Praisan Padungweang, Worarat Krathu, Jonathan H. Chan School of Information Technology King

More information

Bayesian Joint Modelling of Benefit and Risk in Drug Development

Bayesian Joint Modelling of Benefit and Risk in Drug Development Bayesian Joint Modelling of Benefit and Risk in Drug Development EFSPI/PSDM Safety Statistics Meeting Leiden 2017 Disclosure is an employee and shareholder of GSK Data presented is based on human research

More information

SUPPLEMENTARY MATERIAL. Impact of Vaccination on 14 High-Risk HPV type infections: A Mathematical Modelling Approach

SUPPLEMENTARY MATERIAL. Impact of Vaccination on 14 High-Risk HPV type infections: A Mathematical Modelling Approach SUPPLEMENTARY MATERIAL Impact of Vaccination on 14 High-Risk HPV type infections: A Mathematical Modelling Approach Simopekka Vänskä, Kari Auranen, Tuija Leino, Heini Salo, Pekka Nieminen, Terhi Kilpi,

More information

PERFORMANCE MEASURES

PERFORMANCE MEASURES PERFORMANCE MEASURES Of predictive systems DATA TYPES Binary Data point Value A FALSE B TRUE C TRUE D FALSE E FALSE F TRUE G FALSE Real Value Data Point Value a 32.3 b.2 b 2. d. e 33 f.65 g 72.8 ACCURACY

More information

Comparing Two ROC Curves Independent Groups Design

Comparing Two ROC Curves Independent Groups Design Chapter 548 Comparing Two ROC Curves Independent Groups Design Introduction This procedure is used to compare two ROC curves generated from data from two independent groups. In addition to producing a

More information

RESEARCH. Katrina Wilcox Hagberg, 1 Hozefa A Divan, 2 Rebecca Persson, 1 J Curtis Nickel, 3 Susan S Jick 1. open access

RESEARCH. Katrina Wilcox Hagberg, 1 Hozefa A Divan, 2 Rebecca Persson, 1 J Curtis Nickel, 3 Susan S Jick 1. open access open access Risk of erectile dysfunction associated with use of 5-α reductase inhibitors for benign prostatic hyperplasia or alopecia: population based studies using the Clinical Practice Research Datalink

More information

Bayesian Methods LABORATORY. Lesson 1: Jan Software: R. Bayesian Methods p.1/20

Bayesian Methods LABORATORY. Lesson 1: Jan Software: R. Bayesian Methods p.1/20 Bayesian Methods LABORATORY Lesson 1: Jan 24 2002 Software: R Bayesian Methods p.1/20 The R Project for Statistical Computing http://www.r-project.org/ R is a language and environment for statistical computing

More information

Deep Learning Analytics for Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations

Deep Learning Analytics for Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations Deep Learning Analytics for Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations Andy Nguyen, M.D., M.S. Medical Director, Hematopathology, Hematology and Coagulation Laboratory,

More information

Simple Probabilistic Reasoning

Simple Probabilistic Reasoning Simple Probabilistic Reasoning 6.873/HST951 Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Change over 30 years 1970 s: human knowledge, not much data 2000 s:

More information

NONPARAMETRIC MULTI-LEVEL CLUSTERING OF HUMAN EPILEPSY SEIZURES 1

NONPARAMETRIC MULTI-LEVEL CLUSTERING OF HUMAN EPILEPSY SEIZURES 1 The Annals of Applied Statistics 2016, Vol. 10, No. 2, 667 689 DOI: 10.1214/15-AOAS851 Institute of Mathematical Statistics, 2016 NONPARAMETRIC MULTI-LEVEL CLUSTERING OF HUMAN EPILEPSY SEIZURES 1 BY DRAUSIN

More information

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Riccardo Miotto and Chunhua Weng Department of Biomedical Informatics Columbia University,

More information

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s Using Bayesian Networks to Analyze Expression Data Xu Siwei, s0789023 Muhammad Ali Faisal, s0677834 Tejal Joshi, s0677858 Outline Introduction Bayesian Networks Equivalence Classes Applying to Expression

More information

BMI 541/699 Lecture 16

BMI 541/699 Lecture 16 BMI 541/699 Lecture 16 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Proportions & contingency tables -

More information

Remarks on Bayesian Control Charts

Remarks on Bayesian Control Charts Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir

More information

Benchmark Dose Modeling Cancer Models. Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S.

Benchmark Dose Modeling Cancer Models. Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S. Benchmark Dose Modeling Cancer Models Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S. EPA Disclaimer The views expressed in this presentation are those

More information

Mathematical Model for Pneumonia Dynamics among Children

Mathematical Model for Pneumonia Dynamics among Children Mathematical Model for Pneumonia Dynamics among Children by Jacob Otieno Ong ala Strathmore University, Nairobi (Kenya) at SAMSA 2010, Lilongwe (Malawi) Outline 1. Background information of pneumonia 2.

More information

Simultaneous Measurement Imputation and Outcome Prediction for Achilles Tendon Rupture Rehabilitation

Simultaneous Measurement Imputation and Outcome Prediction for Achilles Tendon Rupture Rehabilitation Simultaneous Measurement Imputation and Outcome Prediction for Achilles Tendon Rupture Rehabilitation Charles Hamesse 1, Paul Ackermann 2, Hedvig Kjellström 1, and Cheng Zhang 3 1 KTH Royal Institute of

More information

Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia

Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia JSCI 2017 1 Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia Wutthipong Kongburan, Mark Chignell, and Jonathan H. Chan School of Information Technology King Mongkut's University

More information

A comparative study of different methods for automatic identification of clopidogrel-induced bleeding in electronic health records

A comparative study of different methods for automatic identification of clopidogrel-induced bleeding in electronic health records A comparative study of different methods for automatic identification of clopidogrel-induced bleeding in electronic health records Hee-Jin Lee School of Biomedical Informatics The University of Texas Health

More information

Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making

Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making Himabindu Lakkaraju Department of Computer Science Stanford University himalv@cs.stanford.edu Jure Leskovec

More information

Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ

Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ Tutorial in Biostatistics Received: 11 March 2016, Accepted: 13 September 2016 Published online 16 October 2016 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.7141 Meta-analysis using

More information

Predicting Breast Cancer Survivability Rates

Predicting Breast Cancer Survivability Rates Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer

More information

Improved Intelligent Classification Technique Based On Support Vector Machines

Improved Intelligent Classification Technique Based On Support Vector Machines Improved Intelligent Classification Technique Based On Support Vector Machines V.Vani Asst.Professor,Department of Computer Science,JJ College of Arts and Science,Pudukkottai. Abstract:An abnormal growth

More information

Social Affordance Tracking over Time - A Sensorimotor Account of False-Belief Tasks

Social Affordance Tracking over Time - A Sensorimotor Account of False-Belief Tasks Social Affordance Tracking over Time - A Sensorimotor Account of False-Belief Tasks Judith Bütepage (butepage@kth.se) Hedvig Kjellström (hedvig@csc.kth.se) Danica Kragic (dani@kth.se) Computer Vision and

More information

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Ryo Izawa, Naoki Motohashi, and Tomohiro Takagi Department of Computer Science Meiji University 1-1-1 Higashimita,

More information

The use of Topic Modeling to Analyze Open-Ended Survey Items

The use of Topic Modeling to Analyze Open-Ended Survey Items The use of Topic Modeling to Analyze Open-Ended Survey Items W. Holmes Finch Maria E. Hernández Finch Constance E. McIntosh Claire Braun Ball State University Open ended survey items Researchers making

More information

An Empirical Mixture Model for Large-Scale RTT Measurements

An Empirical Mixture Model for Large-Scale RTT Measurements 1 An Empirical Mixture Model for Large-Scale RTT Measurements Romain Fontugne 1,2 Johan Mazel 1,2 Kensuke Fukuda 1,3 1 National Institute of Informatics 2 JFLI 3 Sokendai June 9, 2015 Introduction RTT:

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS Data provided: Tables of distributions MAS603 SCHOOL OF MATHEMATICS AND STATISTICS Further Clinical Trials Spring Semester 014 015 hours Candidates may bring to the examination a calculator which conforms

More information

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Dipak K. Dey, Ulysses Diva and Sudipto Banerjee Department of Statistics University of Connecticut, Storrs. March 16,

More information

Machine learning II. Juhan Ernits ITI8600

Machine learning II. Juhan Ernits ITI8600 Machine learning II Juhan Ernits ITI8600 Hand written digit recognition 64 Example 2: Face recogition Classification, regression or unsupervised? How many classes? Example 2: Face recognition Classification,

More information