Code2Vec: Embedding and Clustering Medical Diagnosis Data

Size: px
Start display at page:

Download "Code2Vec: Embedding and Clustering Medical Diagnosis Data"

Transcription

1 2017 IEEE International Conference on Healthcare Informatics Code2Vec: Embedding and Clustering Medical Diagnosis Data David Kartchner, Tanner Christensen, Jeffrey Humpherys, Sean Wade Department of Mathematics Brigham Young University Provo, Utah, USA Abstract Identifying disease comorbidities and grouping medical diagnoses into disease incidents are two important problems in health care delivery and assessment. Using vector space embeddings produced using the Global Vectors (GloVe) algorithm, we are able to find useful vector representations of diagnosis codes that can identify related diagnoses and thus improve identification of related disease incidents. Keywords-Diagnosis Codes, Embeddings, Clustering, GloVe, Word2Vec I. INTRODUCTION One of the fundamental problems of health care is foreseeing and preventing the future health problems of patients. To do so, physicians often identify individuals as high risk for diseases when they observe co-occurring conditions called comorbidities. Diabetes and hypertension, for instance, are strong indicators that an individual is at risk to develop chronic kidney disease. While some comorbidities are obvious, others are more subtle and difficult to detect. In this analysis, we explore the viability of identifying comorbidities through insurance claims data using statistical clustering algorithms on vector space embeddings of medical diagnosis and procedure codes. Creating meaningful embeddings is useful for at least three reasons. First, grouping claims into disease episodes is fundamental to calculating both the monetary cost and life impact of a particular disease. A stroke, for instance, can have cascading effects beyond initial treatment, such as subsequent falls or injuries caused by mobility impairment. While clinical classification software (CCS) groupings have been created as one means of grouping related diagnoses, such classification systems could miss more subtle relationships. Since we would expect related diagnoses to group together in our embedded data, such embeddings would provide an additional tool to group related diseases and to classify an individual s medical history into major medical incidents. Second, by taking advantage of large patient populations, our embeddings could both identify comorbidities for hard-to-predict diseases such as epilepsy and provide medical researchers with leads in identifying new comorbidities to more common diseases. While vector space embeddings are commonly used in natural language processing to capture word meaning, few have translated these concepts into the medical field. One notable exception is [1], which classifies diagnoses, procedures, prescriptions, and various other medical terminology using embeddings learned from insurance claims data. The exposition in this paper extends the work of [1] by demonstrating that diagnosis code embeddings can be effectively created without the use of neural networks using the Global Vectors (GloVe) algorithm. It further demonstrates that clustering these embeddings can lend insight into related disease incidents above and beyond that provided by CCS groupings. II. DATA To explore the viability of comorbidity identification via vector space embeddings, we used a database of approximately 2 million insurance claims generated by roughly 90,000 individuals over the course of 5 years. Though traditional medical insurance claims include up to four diagnoses and a procedure code, our data is limited to only a single diagnosis code for each insurance claim, so we restrict our analysis to medical diagnoses. In order to make our data analogous to a corpus of text, we list each individual s diagnoses in roughly chronological order (it is impossible to distinguish between claims filed on the same day) and add an additional dummy code in the place of every month in which the individual received no medical diagnosis. Adding dummy codes ensures that diagnoses that occur years apart do not co-occur, which could happen if an individual temporarily switched insurance providers or simply did not consume any health services for an extended period of time. We then used these claims to train a GloVe model (described in section III below) to obtain 25- dimensional representations of each of the 8,477 codes that appear at least 5 times in our database. A. GloVe III. METHODS GloVe is a method originally developed for finding vector space word embeddings based on word context (i.e. cooccurrence with other words)[2]. GloVe assumes that words with similar meaning occur in similar contexts and uses this information to find vectors that capture this similarity. GloVe learns to represent semantic meaning by considering words in our corpus pairwise and comparing the probability that each co-occurs with a given context word. To do so, define X ij to be the number of times word j appears in the context of word i and X i = j X ij to be the total number of times word i appears in the context of any word. GloVe seeks to find word vectors w j and word context vectors w j that minimize the cost /17 $ IEEE DOI /ICHI

2 functional: J = V f(x ij )(wi T w j + b i + b j log(x ij )) 2 (1) i,j=1 which is essentially a weighted least squares problem. The pieces of J are as follows: 1) The weighting function f is a non-decreasing, bounded function that increases the weight of frequent cooccurrences while down-weighting infrequent cooccurrences. Moreover, f is chosen to be relatively small for large x so that frequent words are not excessively overweighted. In practice, f is chosen to be { (x/xmax ) α x x max f = 1 x>x max where α=0.75 and x max = 100 work well empirically. 2) w i and w j are our word vector and context vector representations, respectively, with respective biases (intercepts) b i and b j. These are the parameters we seek to learn with our model. 3) log(x ij ) comes from considering taking the log of the probabilities P ij = P (j i) = Xij X i and noting that the denominator X i is independent of j and can thus be absorbed into the b i. Reproducing the full derivation of equation III-A is beyond the scope of this paper, but can be found in [2]. This equation is then minimized numerically using gradient descent, then yielding the desired word embeddings. For our purposes, we consider each diagnosis or procedure code to be a word and the sequence of codes for a given patient to be a document in a corpus. We then train GloVe on our corpus of patients, separating patients with a few instances of a dummy code to prevent the model from using codes from unrelated patients simultaneously. While there is no standardized metric for evaluating the goodness of diagnosis code embeddings, a visual inspection seems to indicate that our embeddings are quite accurate. An example of the nearest neighbors of diabetes is shown in Table I. The first seven of these results are quite obviously related to diabetes. Of the last three, while diabetes is well-known for impairing vision by reducing lens clarity on the eye, the connection to myopia is subtle. A justification for the link between diabetes and myopia has been established quite recently, and is currently an area of active research [3], [4], [5]. Recent studies have also shown that diabetics are more likely to suffer from nail fungi, such as dermatophytosis of the nail, which accounts for both (8) and (10) [6]. This last result is particularly compelling since the effect of diabetes on dermatophytosis has been disputed over the years and has only been established relatively recently as a medical fact [7]. B. Clustering Once we have obtained embeddings for our data, we cluster our data using the K-Means algorithm, since agglomerative TABLE I NEAREST NEIGHBORS OF END STAGE RENAL DISEASE GIVEN BY EMBEDDED POINTS. NOTE THAT ALL RELATED CONDITIONS CORRESPOND TO RENAL (KIDNEY) FAILURE OR ASSOCIATED TREATMENTS Diabetes Mellitus Nearest Neighbors 1 Diabetes mellitus without mention of complication, type I [juvenile type], uncontrolled 2 Diabetes mellitus without mention of complication, type II or unspecified type, uncontrolled 3 Diabetes with neurological manifestations, type II or unspecified type, not stated as uncontrolled 4 Diabetes with renal manifestations, type II or unspecified type, not stated as uncontrolled 5 Diabetes with ophthalmic manifestations, type I [juvenile type], not stated as uncontrolled 6 Diabetes with ophthalmic manifestations, type II or unspecified type, not stated as uncontrolled 7 Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled 8 Dermatophytosis of nail 9 Myopia 10 Other specified diseases of nail clustering methods are too computationally intensive to efficiently cluster our data. We find these clusters using the following two methods: 1) Let initial cluster centroids be the mean of the embedded vector representations for each CCS group represented in our data. This presents an intuitive choice for clusters, but also fixes the number of clusters at 259, which could be unnecessarily restrictive. 2) Choose initial centroids according to the k-means++ procedure described in [8] as follows: a) Randomly choose the first centroid to be a point from the dataset to be clustered. b) For each successive centroid, pick x j to be the next centroid with probability p j = D(xj), n D(x j) j=1 where D(x j ) is the distance from x j to the nearest centroid. This step is repeated until k centroids have been chosen. Fig. 1 shows a visualization of various 2-dimensional projections the clusters found in our data using t-distributed stochastic neighbor embeddings (t-sne), principal component projections (PCA), and linear discriminant projections. In order to make this figure more meaningful, we limit our plots to clusters pertaining to select diagnoses. We select these clusters by picking a major diagnosis code representative of of the disease in question and choose the cluster to which it belongs. Though limited by low dimensionality, one can observe that at least some of the data form into relatively distinct clusters, especially under the t-sne projection. For the sake of brevity, fig 1 shows only data clustered using the 387

3 Fig. 1. Visual comparison of clusters with initial centroids set at means of CCS groupings. The results using k-means++ centroids are qualitatively similar. means of CCS groupings as centroids because the clusters generated using k-means++ are qualitatively similar. IV. RESULTS We now return to our initial question of whether clusters of embedded diagnoses can help us identify comorbidities in our data. Since clustering is an inherently unsupervised task, we acknowledge that we do not have a simple, absolute metric by which to assess cluster validity. Many conditions may be comorbidities to multiple diseases, but we restrict each to be in exactly one cluster, nor do we have exhaustive information on known comorbidities. In the absence of such data, we heuristically assess both cluster validity and the presence of comorbidities by inspecting a few key clusters element-byelement and attempt to determine how much each data point is actually related to the main theme of the cluster. To illustrate how this procedure works, consider the cluster containing the diagnosis code for advanced chronic kidney disease, known as end-stage renal disease (ESRD). Once an individual enters end-stage kidney disease, his or her kidneys have lost so much function that dialysis is required multiple times a week to properly filter toxins from the blood. Worse, chronic kidney disease is irreversible, so individuals in endstage must either receive a kidney transplant or receive dialysis for the rest of their lives. Thus, we would expect codes in our ESRD cluster to be related to advanced kidney damage, dialysis, kidney transplants, and associated conditions. Table II summarizes codes contained in our ESRD cluster. It is readily apparent that the entries in the table correspond 388

4 TABLE II SUBSET DIAGNOSES AND PROCEDURES CONTAINED IN CLUSTER CORRESPONDING TO END-STAGE RENAL DISEASE. NOTETHATALLBUTTWOOFTHESE CONDITIONS EXPLICITLY RELATE TO RENAL PROBLEMS OR DIALYSIS. End-Stage Renal Disease Related Code CCS Group 1 Kidney replaced by transplant Chronic kidney disease, unspecified Chronic kidney disease, Stage IV (severe) Anemia in chronic kidney disease 59 5 Diabetes with renal manifestations, type II or unspecified type, not stated as uncontrolled 50 6 Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage I through 99 stage IV, or unspecified 7 Chronic kidney disease, Stage II (mild) Chronic kidney disease, Stage V Secondary hyperparathyroidism (of renal origin) Diabetes with renal manifestations, type I [juvenile type], not stated as uncontrolled Hypertensive chronic kidney disease, benign, with chronic kidney disease stage I through stage 99 IV, or unspecified 12 Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end 99 stage renal disease 13 Other malignant lymphomas, lymph nodes of multiple sites Complications of transplanted bone marrow Complications of transplanted kidney Polycystic kidney, unspecified type End stage renal disease 158 directly to renal problems, with the possible exception of entries 13 and 14. Further inspection, however, reveals that kidney failure often follows bone marrow transplants [9] and also that kidney function is often a symptom of lymphoma [10]. Thus we see that each of the codes we investigated from our ESRD cluster is closely linked to ESRD, which lends strength to the hypothesis that clusterings could be useful in identifying disease comorbidities. We further see that our combining clustering with our embeddings is able to capture relationships between diagnoses and procedures not captured in CCS groupings alone, as can be seen by the diversity of CCS groupings in table II. We note that while the results of clustering around ESRD are promising, other diagnoses exhibit worse results. Burns, for example, are clustered together with insect bites, presumably because both deal with skin irritation and blistering. In spite of this, we believe that our clusters could be a useful tool because we would expect chronic diseases (e.g. kidney disease) to show more consistent, long-term patterns of comorbidities than incidental injuries (e.g. burns). A broader, more systematic exploration of these patterns is a potential area of future research. V. CONCLUSION Vector space embeddings can be a powerful means of mining meaning from medical text data. Using the GloVe algorithm to create 25-dimensional embeddings of medical diagnosis codes, we cluster diagnoses and use these clusters to identify related diseases, even if such are in different CCS categories. This success indicates that embeddings capture some level of inherent meaning present in the diagnosis codes, suggesting that these embeddings could be useful features for disease prediction algorithm. Such is a promising area of future research. VI. ACKNOWLEDGEMENTS This work was supported in part by the National Science Foundation, Grant Number and the Defense Threat Reduction Agency, Grant Number HDRTA REFERENCES [1] Y. Choi, C. Y.-I. Chiu, and D. Sontag, Learning low-dimensional representations of medical concepts, in AMIA Summits on Translational Science Proceedings, 2016, pp [2] J. Pennington, R. Socher, and C. D. Manning, Glove: Global vectors for word representation. in EMNLP, vol. 14, 2014, pp [3] H. C. Fledelius, Myopia and diabetes mellitus with special reference to adult-onset myopia, Acta Ophthalmologica, vol. 64, no. 1, pp , Feb [4] M. A. A. Paul Chous, Ocular manifestations of diabetes: Some clues for eyecare professionals, May [Online]. Available: modernmedicine.com/optometrytimes/content/tags/diabetes/ ocular-manifestations-diabetes-some-clues-eyecare-professionals [5] M. Young, Connecting diabetes and myopia, April [Online]. Available: article-connecting-diabetes-and-myopia [6] T. C. Vlahovic and J. A. Sebag, Onychomycosis in Diabetics. Cham: Springer International Publishing, 2017, pp [Online]. Available: 17 [7] A. Lugo-Somolinos and J. Sanches, Prevalence of dermatophytosis in patients with diabetes, Journal of the American Academy of Dermatology, vol. 26, pp , March

5 [8] D. Arthur and S. Vassilvitskii, K-means++: The advantages of careful seeding, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA 07. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2007, pp [Online]. Available: [9] B. Pulla, Y. Barry, and A. E., Acute renal failure following bone marrow transplantation, Renal Failure, vol. 20, pp , May [10] L. J. Cohen, H. G. Rennke, J. P. Laubach, and B. D. Humphreys, The spectrum of kidney involvement in lymphoma: a case report and review of the literature, American Journal of Kidney Diseases: The Official Journal of the National Kidney Foundation, vol. 56, pp , [11] L. van der Maaten and G. Hinton, Visualizing high-dimensional data using t-sne, [12] M. K. Kuhlmann, A. Kribben, M. Wittwer, and W. H. Hrl, Optamalnutrition in chronic renal failure, Nephrology Dialysis Transplantation, vol. 22, no. 3, p. iii13, [Online]. Available: [13] H. C. I. I. Institute. (2017) Prometheus analytics. [14] F. Hildebrant, Renal medicine 1: Genetic kidney diseases, The Lancet, vol. 375, no. 9722, pp [15] S. J. Ryu, Intracranial hemorrhage in patients with polycystic kidney disease. Stroke, vol. 21, no. 2, pp , [Online]. Available: [16] W. McKinney, Data structures for statistical computing in python, in Proceedings of the 9th Python in Science Conference, S. van der Walt and J. Millman, Eds., 2010, pp [17] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol. 12, pp , [18] W. H. Organization et al., International classification of diseases:[9th] ninth revision, basic tabulation list with alphabetic index,

Automated Estimation of mts Score in Hand Joint X-Ray Image Using Machine Learning

Automated Estimation of mts Score in Hand Joint X-Ray Image Using Machine Learning Automated Estimation of mts Score in Hand Joint X-Ray Image Using Machine Learning Shweta Khairnar, Sharvari Khairnar 1 Graduate student, Pace University, New York, United States 2 Student, Computer Engineering,

More information

Quantile Regression for Final Hospitalization Rate Prediction

Quantile Regression for Final Hospitalization Rate Prediction Quantile Regression for Final Hospitalization Rate Prediction Nuoyu Li Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 nuoyul@cs.cmu.edu 1 Introduction Influenza (the flu) has

More information

arxiv: v1 [stat.ml] 24 Aug 2017

arxiv: v1 [stat.ml] 24 Aug 2017 An Ensemble Classifier for Predicting the Onset of Type II Diabetes arxiv:1708.07480v1 [stat.ml] 24 Aug 2017 John Semerdjian School of Information University of California Berkeley Berkeley, CA 94720 jsemer@berkeley.edu

More information

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Early Detection of Diabetes from Health Claims

Early Detection of Diabetes from Health Claims Early Detection of Diabetes from Health Claims Rahul G. Krishnan, Narges Razavian, Youngduck Choi New York University {rahul,razavian,yc1104}@cs.nyu.edu Saul Blecker, Ann Marie Schmidt NYU School of Medicine

More information

Feature Engineering for Depression Detection in Social Media

Feature Engineering for Depression Detection in Social Media Maxim Stankevich, Vadim Isakov, Dmitry Devyatkin and Ivan Smirnov Institute for Systems Analysis, Federal Research Center Computer Science and Control of RAS, Moscow, Russian Federation Keywords: Abstract:

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

Comparing Machine Learning Clustering with Latent Class Analysis on Cancer Symptoms Data

Comparing Machine Learning Clustering with Latent Class Analysis on Cancer Symptoms Data Comparing Machine Learning Clustering with Latent Class Analysis on Cancer Symptoms Data Nikolaos Papachristou, Christine Miaskowski, Payam Barnaghi-Senior IEEE Member, Roma Maguire, Nazli Farajidavar-IEEE

More information

Heroes and Villains: What A.I. Can Tell Us About Movies

Heroes and Villains: What A.I. Can Tell Us About Movies Heroes and Villains: What A.I. Can Tell Us About Movies Brahm Capoor Department of Symbolic Systems brahm@stanford.edu Varun Nambikrishnan Department of Computer Science varun14@stanford.edu Michael Troute

More information

Automatic pathology classification using a single feature machine learning - support vector machines

Automatic pathology classification using a single feature machine learning - support vector machines Automatic pathology classification using a single feature machine learning - support vector machines Fernando Yepes-Calderon b,c, Fabian Pedregosa e, Bertrand Thirion e, Yalin Wang d,* and Natasha Leporé

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

Seizure Detection Challenge The Fitzgerald team solution

Seizure Detection Challenge The Fitzgerald team solution Seizure Detection Challenge The Fitzgerald team solution Vincent Adam, Joana Soldado-Magraner, Wittawat Jitkritum, Heiko Strathmann, Balaji Lakshminarayanan, Alessandro Davide Ialongo, Gergő Bohner, Ben

More information

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering Kunal Sharma CS 4641 Machine Learning Abstract Clustering is a technique that is commonly used in unsupervised

More information

arxiv: v1 [cs.lg] 4 Feb 2019

arxiv: v1 [cs.lg] 4 Feb 2019 Machine Learning for Seizure Type Classification: Setting the benchmark Subhrajit Roy [000 0002 6072 5500], Umar Asif [0000 0001 5209 7084], Jianbin Tang [0000 0001 5440 0796], and Stefan Harrer [0000

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Social Media Mining to Understand Public Mental Health

Social Media Mining to Understand Public Mental Health Social Media Mining to Understand Public Mental Health Andrew Toulis and Lukasz Golab University of Waterloo, Waterloo, Ontario, Canada N2L 3G1 {aptoulis,lgolab}@uwaterloo.ca Abstract. In this paper, we

More information

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Ryo Izawa, Naoki Motohashi, and Tomohiro Takagi Department of Computer Science Meiji University 1-1-1 Higashimita,

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Literature Survey Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is

More information

Leveraging Data for Targeted Patient Population Health Improvements

Leveraging Data for Targeted Patient Population Health Improvements Leveraging Data for Targeted Patient Population Health Improvements Diabetes Impact on Health in America Diabetes is a major health concern in the United States today. The Centers for Disease Control

More information

Chapter 2: Identification and Care of Patients With CKD

Chapter 2: Identification and Care of Patients With CKD Chapter 2: Identification and Care of Patients With CKD Over half of patients in the Medicare 5% sample (aged 65 and older) had at least one of three diagnosed chronic conditions chronic kidney disease

More information

Discriminant Analysis with Categorical Data

Discriminant Analysis with Categorical Data - AW)a Discriminant Analysis with Categorical Data John E. Overall and J. Arthur Woodward The University of Texas Medical Branch, Galveston A method for studying relationships among groups in terms of

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017 RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science

More information

Chapter 2: Identification and Care of Patients with CKD

Chapter 2: Identification and Care of Patients with CKD Chapter 2: Identification and Care of Patients with CKD Over half of patients in the Medicare 5% sample (aged 65 and older) had at least one of three diagnosed chronic conditions chronic kidney disease

More information

Does Machine Learning. In a Learning Health System?

Does Machine Learning. In a Learning Health System? Does Machine Learning Have a Place In a Learning Health System? Grand Rounds: Rethinking Clinical Research Friday, December 15, 2017 Michael J. Pencina, PhD Professor of Biostatistics and Bioinformatics,

More information

NMF-Density: NMF-Based Breast Density Classifier

NMF-Density: NMF-Based Breast Density Classifier NMF-Density: NMF-Based Breast Density Classifier Lahouari Ghouti and Abdullah H. Owaidh King Fahd University of Petroleum and Minerals - Department of Information and Computer Science. KFUPM Box 1128.

More information

Gray level cooccurrence histograms via learning vector quantization

Gray level cooccurrence histograms via learning vector quantization Gray level cooccurrence histograms via learning vector quantization Timo Ojala, Matti Pietikäinen and Juha Kyllönen Machine Vision and Media Processing Group, Infotech Oulu and Department of Electrical

More information

Predictive Diagnosis. Clustering to Better Predict Heart Attacks x The Analytics Edge

Predictive Diagnosis. Clustering to Better Predict Heart Attacks x The Analytics Edge Predictive Diagnosis Clustering to Better Predict Heart Attacks 15.071x The Analytics Edge Heart Attacks Heart attack is a common complication of coronary heart disease resulting from the interruption

More information

Mapping Patient Trajectories using Longitudinal Extraction and Deep Learning in the MIMIC-III Critical Care Database *

Mapping Patient Trajectories using Longitudinal Extraction and Deep Learning in the MIMIC-III Critical Care Database * Mapping Patient Trajectories using Longitudinal Extraction and Deep Learning in the MIMIC-III Critical Care Database * Brett K. Beaulieu-Jones 1, Patryk Orzechowski 1,2 and Jason H. Moore 1 1 Computational

More information

NQF-ENDORSED VOLUNTARY CONSENSUS STANDARD FOR HOSPITAL CARE. Measure Information Form Collected For: CMS Outcome Measures (Claims Based)

NQF-ENDORSED VOLUNTARY CONSENSUS STANDARD FOR HOSPITAL CARE. Measure Information Form Collected For: CMS Outcome Measures (Claims Based) Last Updated: Version 4.3 NQF-ENDORSED VOLUNTARY CONSENSUS STANDARD FOR HOSPITAL CARE Measure Information Form Collected For: CMS Outcome Measures (Claims Based) Measure Set: CMS Readmission Measures Set

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE

Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE Peng Qiu1,4, Erin F. Simonds2, Sean C. Bendall2, Kenneth D. Gibbs Jr.2, Robert V. Bruggner2, Michael

More information

Prediction of Malignant and Benign Tumor using Machine Learning

Prediction of Malignant and Benign Tumor using Machine Learning Prediction of Malignant and Benign Tumor using Machine Learning Ashish Shah Department of Computer Science and Engineering Manipal Institute of Technology, Manipal University, Manipal, Karnataka, India

More information

Predicting Kidney Cancer Survival from Genomic Data

Predicting Kidney Cancer Survival from Genomic Data Predicting Kidney Cancer Survival from Genomic Data Christopher Sauer, Rishi Bedi, Duc Nguyen, Benedikt Bünz Abstract Cancers are on par with heart disease as the leading cause for mortality in the United

More information

Chapter 2: Identification and Care of Patients With CKD

Chapter 2: Identification and Care of Patients With CKD Chapter 2: Identification and Care of Patients With Over half of patients from the Medicare 5% sample (restricted to age 65 and older) have a diagnosis of chronic kidney disease (), cardiovascular disease,

More information

the body and the front interior of a sedan. It features a projected LCD instrument cluster controlled

the body and the front interior of a sedan. It features a projected LCD instrument cluster controlled Supplementary Material Driving Simulator and Environment Specifications The driving simulator was manufactured by DriveSafety and included the front three quarters of the body and the front interior of

More information

EEG Features in Mental Tasks Recognition and Neurofeedback

EEG Features in Mental Tasks Recognition and Neurofeedback EEG Features in Mental Tasks Recognition and Neurofeedback Ph.D. Candidate: Wang Qiang Supervisor: Asst. Prof. Olga Sourina Co-Supervisor: Assoc. Prof. Vladimir V. Kulish Division of Information Engineering

More information

Analyzing Spammers Social Networks for Fun and Profit

Analyzing Spammers Social Networks for Fun and Profit Chao Yang Robert Harkreader Jialong Zhang Seungwon Shin Guofei Gu Texas A&M University Analyzing Spammers Social Networks for Fun and Profit A Case Study of Cyber Criminal Ecosystem on Twitter Presentation:

More information

Data Fusion: Integrating patientreported survey data and EHR data for health outcomes research

Data Fusion: Integrating patientreported survey data and EHR data for health outcomes research Data Fusion: Integrating patientreported survey data and EHR data for health outcomes research Lulu K. Lee, PhD Director, Health Outcomes Research Our Development Journey Research Goals Data Sources and

More information

Automated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection

Automated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection Y.-H. Wen, A. Bainbridge-Smith, A. B. Morris, Automated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection, Proceedings of Image and Vision Computing New Zealand 2007, pp. 132

More information

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing Categorical Speech Representation in the Human Superior Temporal Gyrus Edward F. Chang, Jochem W. Rieger, Keith D. Johnson, Mitchel S. Berger, Nicholas M. Barbaro, Robert T. Knight SUPPLEMENTARY INFORMATION

More information

A HMM-based Pre-training Approach for Sequential Data

A HMM-based Pre-training Approach for Sequential Data A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University

More information

AUTOMATIC MEASUREMENT ON CT IMAGES FOR PATELLA DISLOCATION DIAGNOSIS

AUTOMATIC MEASUREMENT ON CT IMAGES FOR PATELLA DISLOCATION DIAGNOSIS AUTOMATIC MEASUREMENT ON CT IMAGES FOR PATELLA DISLOCATION DIAGNOSIS Qi Kong 1, Shaoshan Wang 2, Jiushan Yang 2,Ruiqi Zou 3, Yan Huang 1, Yilong Yin 1, Jingliang Peng 1 1 School of Computer Science and

More information

USRDS UNITED STATES RENAL DATA SYSTEM

USRDS UNITED STATES RENAL DATA SYSTEM USRDS UNITED STATES RENAL DATA SYSTEM Chapter 2: Identification and Care of Patients With CKD Over half of patients from the Medicare 5 percent sample have either a diagnosis of chronic kidney disease

More information

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model Ruifeng Xu, Chengtian Zou, Jun Xu Key Laboratory of Network Oriented Intelligent Computation, Shenzhen Graduate School,

More information

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011 Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011 I. Purpose Drawing from the profile development of the QIBA-fMRI Technical Committee,

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Measuring Focused Attention Using Fixation Inner-Density

Measuring Focused Attention Using Fixation Inner-Density Measuring Focused Attention Using Fixation Inner-Density Wen Liu, Mina Shojaeizadeh, Soussan Djamasbi, Andrew C. Trapp User Experience & Decision Making Research Laboratory, Worcester Polytechnic Institute

More information

Knowledge Extraction and Outcome Prediction using Medical Notes

Knowledge Extraction and Outcome Prediction using Medical Notes Ryan Cobb, Sahil Puri, Daisy Wang RCOBB, SAHIL, DAISYW@CISE.UFL.EDU Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL Tezcan Baslanti, Azra Bihorac TOZRAZGATBASLANTI,

More information

RoBO: A Flexible and Robust Bayesian Optimization Framework in Python

RoBO: A Flexible and Robust Bayesian Optimization Framework in Python RoBO: A Flexible and Robust Bayesian Optimization Framework in Python Aaron Klein kleinaa@cs.uni-freiburg.de Numair Mansur mansurm@cs.uni-freiburg.de Stefan Falkner sfalkner@cs.uni-freiburg.de Frank Hutter

More information

Chapter 2: Identification and Care of Patients With Chronic Kidney Disease

Chapter 2: Identification and Care of Patients With Chronic Kidney Disease Chapter 2: Identification and Care of Patients With Chronic Kidney Disease Introduction The examination of care in patients with chronic kidney disease (CKD) is a significant challenge, as most large datasets

More information

Comparing Machine Learning Clustering with Latent Class Analysis on Cancer Symptoms Data

Comparing Machine Learning Clustering with Latent Class Analysis on Cancer Symptoms Data Comparing Machine Learning Clustering with Latent Class Analysis on Cancer Symptoms Data Nikolaos Papachristou 1, Christine Miaskowski 2, Payam Barnaghi 1, Roma Maguire 1, Nazli Farajidavar 1, Bruce Cooper

More information

Nearest Shrunken Centroid as Feature Selection of Microarray Data

Nearest Shrunken Centroid as Feature Selection of Microarray Data Nearest Shrunken Centroid as Feature Selection of Microarray Data Myungsook Klassen Computer Science Department, California Lutheran University 60 West Olsen Rd, Thousand Oaks, CA 91360 mklassen@clunet.edu

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Lecture #4: Overabundance Analysis and Class Discovery

Lecture #4: Overabundance Analysis and Class Discovery 236632 Topics in Microarray Data nalysis Winter 2004-5 November 15, 2004 Lecture #4: Overabundance nalysis and Class Discovery Lecturer: Doron Lipson Scribes: Itai Sharon & Tomer Shiran 1 Differentially

More information

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis DSC 4/5 Multivariate Statistical Methods Applications DSC 4/5 Multivariate Statistical Methods Discriminant Analysis Identify the group to which an object or case (e.g. person, firm, product) belongs:

More information

Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data. Technical Report

Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data. Technical Report Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street

More information

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi

More information

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition

More information

Validating the Visual Saliency Model

Validating the Visual Saliency Model Validating the Visual Saliency Model Ali Alsam and Puneet Sharma Department of Informatics & e-learning (AITeL), Sør-Trøndelag University College (HiST), Trondheim, Norway er.puneetsharma@gmail.com Abstract.

More information

DEEP convolutional neural networks have gained much

DEEP convolutional neural networks have gained much Real-time emotion recognition for gaming using deep convolutional network features Sébastien Ouellet arxiv:8.37v [cs.cv] Aug 2 Abstract The goal of the present study is to explore the application of deep

More information

Smart NDT Tools: Connection and Automation for Efficient and Reliable NDT Operations

Smart NDT Tools: Connection and Automation for Efficient and Reliable NDT Operations 19 th World Conference on Non-Destructive Testing 2016 Smart NDT Tools: Connection and Automation for Efficient and Reliable NDT Operations Frank GUIBERT 1, Mona RAFRAFI 2, Damien RODAT 1, Etienne PROTHON

More information

Predicting Breast Cancer Survivability Rates

Predicting Breast Cancer Survivability Rates Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer

More information

The use of Topic Modeling to Analyze Open-Ended Survey Items

The use of Topic Modeling to Analyze Open-Ended Survey Items The use of Topic Modeling to Analyze Open-Ended Survey Items W. Holmes Finch Maria E. Hernández Finch Constance E. McIntosh Claire Braun Ball State University Open ended survey items Researchers making

More information

MRI Image Processing Operations for Brain Tumor Detection

MRI Image Processing Operations for Brain Tumor Detection MRI Image Processing Operations for Brain Tumor Detection Prof. M.M. Bulhe 1, Shubhashini Pathak 2, Karan Parekh 3, Abhishek Jha 4 1Assistant Professor, Dept. of Electronics and Telecommunications Engineering,

More information

Inferring Disease Contact Networks from Genetic Data

Inferring Disease Contact Networks from Genetic Data Inferring Disease Contact Networks from Genetic Data Frank Chen, Bryan Hooi December 11, 2013 Abstract The analysis of genetic sequence data collected during disease outbreaks has emerged as a promising

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Local Image Structures and Optic Flow Estimation

Local Image Structures and Optic Flow Estimation Local Image Structures and Optic Flow Estimation Sinan KALKAN 1, Dirk Calow 2, Florentin Wörgötter 1, Markus Lappe 2 and Norbert Krüger 3 1 Computational Neuroscience, Uni. of Stirling, Scotland; {sinan,worgott}@cn.stir.ac.uk

More information

Performance and Saliency Analysis of Data from the Anomaly Detection Task Study

Performance and Saliency Analysis of Data from the Anomaly Detection Task Study Performance and Saliency Analysis of Data from the Anomaly Detection Task Study Adrienne Raglin 1 and Andre Harrison 2 1 U.S. Army Research Laboratory, Adelphi, MD. 20783, USA {adrienne.j.raglin.civ, andre.v.harrison2.civ}@mail.mil

More information

Information Processing During Transient Responses in the Crayfish Visual System

Information Processing During Transient Responses in the Crayfish Visual System Information Processing During Transient Responses in the Crayfish Visual System Christopher J. Rozell, Don. H. Johnson and Raymon M. Glantz Department of Electrical & Computer Engineering Department of

More information

Lecture 13: Finding optimal treatment policies

Lecture 13: Finding optimal treatment policies MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 13: Finding optimal treatment policies Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Peter Bodik for slides on reinforcement learning) Outline

More information

Automatic Lung Cancer Detection Using Volumetric CT Imaging Features

Automatic Lung Cancer Detection Using Volumetric CT Imaging Features Automatic Lung Cancer Detection Using Volumetric CT Imaging Features A Research Project Report Submitted To Computer Science Department Brown University By Dronika Solanki (B01159827) Abstract Lung cancer

More information

Introduction to Computational Neuroscience

Introduction to Computational Neuroscience Introduction to Computational Neuroscience Lecture 5: Data analysis II Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis II 6 Single

More information

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some

More information

OBSERVATIONAL MEDICAL OUTCOMES PARTNERSHIP

OBSERVATIONAL MEDICAL OUTCOMES PARTNERSHIP OBSERVATIONAL Patient-centered observational analytics: New directions toward studying the effects of medical products Patrick Ryan on behalf of OMOP Research Team May 22, 2012 Observational Medical Outcomes

More information

Worldwide Influenza Surveillance through Twitter

Worldwide Influenza Surveillance through Twitter The World Wide Web and Public Health Intelligence: Papers from the 2015 AAAI Workshop Worldwide Influenza Surveillance through Twitter Michael J. Paul a, Mark Dredze a, David A. Broniatowski b, Nicholas

More information

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA The uncertain nature of property casualty loss reserves Property Casualty loss reserves are inherently uncertain.

More information

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System T.Manikandan 1, Dr. N. Bharathi 2 1 Associate Professor, Rajalakshmi Engineering College, Chennai-602 105 2 Professor, Velammal Engineering

More information

Interalgorithmic Consolidation for Pattern Recognition Applied to Melanoma Genomic Data

Interalgorithmic Consolidation for Pattern Recognition Applied to Melanoma Genomic Data Interalgorithmic Consolidation for Pattern Recognition Applied to Melanoma Genomic Data Brody Kutt Department of Mathematics Rochester Institute of Technology Rochester, NY Rachel Burdorf Department of

More information

A Semi-supervised Approach to Perceived Age Prediction from Face Images

A Semi-supervised Approach to Perceived Age Prediction from Face Images IEICE Transactions on Information and Systems, vol.e93-d, no.10, pp.2875 2878, 2010. 1 A Semi-supervised Approach to Perceived Age Prediction from Face Images Kazuya Ueki NEC Soft, Ltd., Japan Masashi

More information

3. Model evaluation & selection

3. Model evaluation & selection Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

More information

Bayesian models of inductive generalization

Bayesian models of inductive generalization Bayesian models of inductive generalization Neville E. Sanjana & Joshua B. Tenenbaum Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 239 nsanjana, jbt @mit.edu

More information

Unsupervised MRI Brain Tumor Detection Techniques with Morphological Operations

Unsupervised MRI Brain Tumor Detection Techniques with Morphological Operations Unsupervised MRI Brain Tumor Detection Techniques with Morphological Operations Ritu Verma, Sujeet Tiwari, Naazish Rahim Abstract Tumor is a deformity in human body cells which, if not detected and treated,

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

Contrasting the Contrast Sets: An Alternative Approach

Contrasting the Contrast Sets: An Alternative Approach Contrasting the Contrast Sets: An Alternative Approach Amit Satsangi Department of Computing Science University of Alberta, Canada amit@cs.ualberta.ca Osmar R. Zaïane Department of Computing Science University

More information

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION OPT-i An International Conference on Engineering and Applied Sciences Optimization M. Papadrakakis, M.G. Karlaftis, N.D. Lagaros (eds.) Kos Island, Greece, 4-6 June 2014 A NOVEL VARIABLE SELECTION METHOD

More information

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS Semantic Alignment between ICD-11 and SNOMED-CT By Marcie Wright RHIA, CHDA, CCS World Health Organization (WHO) owns and publishes the International Classification of Diseases (ICD) WHO was entrusted

More information

Spatiotemporal clustering of synchronized bursting events in neuronal networks

Spatiotemporal clustering of synchronized bursting events in neuronal networks Spatiotemporal clustering of synchronized bursting events in neuronal networks Uri Barkan a David Horn a,1 a School of Physics and Astronomy, Tel Aviv University, Tel Aviv 69978, Israel Abstract in vitro

More information

BLOOD GLUCOSE PREDICTION MODELS FOR PERSONALIZED DIABETES MANAGEMENT

BLOOD GLUCOSE PREDICTION MODELS FOR PERSONALIZED DIABETES MANAGEMENT BLOOD GLUCOSE PREDICTION MODELS FOR PERSONALIZED DIABETES MANAGEMENT A Thesis Submitted to the Graduate Faculty of the North Dakota State University of Agriculture and Applied Science By Warnakulasuriya

More information

Overview of the NTCIR-13: MedWeb Task

Overview of the NTCIR-13: MedWeb Task Overview of the NTCIR-13: MedWeb Task Shoko Wakamiya Nara Institute of Science and Technology, Japan wakamiya@is.naist.jp Mizuki Morita Okayama University, Japan morita.mizuki@gmail.com Tomoko Ohkuma Fuji

More information

Outlier Analysis. Lijun Zhang

Outlier Analysis. Lijun Zhang Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based

More information

Natural Scene Statistics and Perception. W.S. Geisler

Natural Scene Statistics and Perception. W.S. Geisler Natural Scene Statistics and Perception W.S. Geisler Some Important Visual Tasks Identification of objects and materials Navigation through the environment Estimation of motion trajectories and speeds

More information

1.4 - Linear Regression and MS Excel

1.4 - Linear Regression and MS Excel 1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear

More information

and errs as expected. The disadvantage of this approach is that it is time consuming, due to the fact that it is necessary to evaluate all algorithms,

and errs as expected. The disadvantage of this approach is that it is time consuming, due to the fact that it is necessary to evaluate all algorithms, Data transformation and model selection by experimentation and meta-learning Pavel B. Brazdil LIACC, FEP - University of Porto Rua Campo Alegre, 823 4150 Porto, Portugal Email: pbrazdil@ncc.up.pt Research

More information

A Comparison of Linear Mixed Models to Generalized Linear Mixed Models: A Look at the Benefits of Physical Rehabilitation in Cardiopulmonary Patients

A Comparison of Linear Mixed Models to Generalized Linear Mixed Models: A Look at the Benefits of Physical Rehabilitation in Cardiopulmonary Patients Paper PH400 A Comparison of Linear Mixed Models to Generalized Linear Mixed Models: A Look at the Benefits of Physical Rehabilitation in Cardiopulmonary Patients Jennifer Ferrell, University of Louisville,

More information

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Dean Eckles Department of Communication Stanford University dean@deaneckles.com Abstract

More information

Positive and Unlabeled Relational Classification through Label Frequency Estimation

Positive and Unlabeled Relational Classification through Label Frequency Estimation Positive and Unlabeled Relational Classification through Label Frequency Estimation Jessa Bekker and Jesse Davis Computer Science Department, KU Leuven, Belgium firstname.lastname@cs.kuleuven.be Abstract.

More information

Modeling Sentiment with Ridge Regression

Modeling Sentiment with Ridge Regression Modeling Sentiment with Ridge Regression Luke Segars 2/20/2012 The goal of this project was to generate a linear sentiment model for classifying Amazon book reviews according to their star rank. More generally,

More information