PhenDisco: a new phenotype discovery system for the database of genotypes and phenotypes

Size: px
Start display at page:

Download "PhenDisco: a new phenotype discovery system for the database of genotypes and phenotypes"

Transcription

1 PhenDisco: a new phenotype discovery system for the database of genotypes and phenotypes Son Doan, Hyeoneui Kim Division of Biomedical Informatics University of California San Diego Open Access Journal Club, 09/05/2013

2 Roadmap to the Presentation Background dbgap Challenges in using dbgap pfindr program PhenDisco development User requirement analysis for PhenDisco Data standardization (variables, study metadata) System development: technical details PhenDisco demo Performance evaluation 9/5/13 2

3 Background 9/5/13 3

4 Overview on dbgap Database of Genotypes and Phenotypes Developed by NCBI Stores and distributes the data and outputs of the studies on the interactions of genotypes & phenotypes Provides 2 levels of access Open access: variable information including summary statistics and study information Controlled access: raw data upon approval by NIH DAC 9/5/13 4

5 A Typical Challenge in Using dbgap Potentially, dbgap is great it contains so many different types of studies and their data! However, I find it very hard to reuse dbgap data because there is no easy but robust way to filter studies by important study related information such as study design, analysis methods, analysis data produced by the studies. Even if I find the studies that seem fitting to my needs, I still need to make sure that the studies have the genotype and/or the phenotype information that I need. Of course, dealing with the data values with all sort of different formats is another challenge to go through (Erin Smith, PhD, Division of Genome Information Science, UCSD) 9/5/13 5

6 9/5/13 6

7 9/5/13 7

8 9/5/13 8

9 pfindr (phenotype Finding IN Data Repositories) Funded by NHLBI To facilitate dbgap use by improving accuracy and completeness of search returns Standardized phenotype variables Searchable study related information 9/5/13 9

10 User Requirement Analysis 9/5/13 10

11 Use-Case Driven Development User requirements collected from Analysis of data use descriptions from data requests available in dbgap (14,287 requests) Online user survey (17 users) User interviews (8 local dbgap users) NIH officers/scientific Advisory Board recommendations and suggestions 9/5/13 11

12 Data Request Analysis Health Care Daily Function or Activity Activity Food Organism Social Function Behavior Other Cardiovascular Disease (8.1%) Mental Process Qualitative Concept Mood, Emotion, and Individual Behavior Disease Neoplasm/Cancer (30%) Genetic Disease Congenital Abnormality (8.6%) Clinical Attributes Diagnostic Procedure Psychiatric Disease (13%) Signs or Symptoms Pathologic Function Laboratory Procedure or Test Research Activity Chemical or Biological Substance Therapeutic or preventive Procedure 9/5/13 12

13 Interviews, Survey and SAB/NIH officers feedback Functions that maximize search efficiency Examples option to expand search terms through synonyms studies displayed in the order of relevancy select studies from the returned list and save for later review search results organized in a way that supports quick browsing 9/5/13 13

14 Problems We Addressed Focus areas: Completeness and accuracy of search results Abbreviation expansion Concept-based search Ease of result review Sorting the results by relevancy Highlighting search keywords in the retrieved records Additional functionality Export of selected study and variable information Categorization of variables 9/5/13 14

15 Data Standardization Variable Standardization Study Level Metadata Generation 9/5/13 15

16 Phenotype Variable Standardization Variable ID Variable Name Variable Description Phv v2.p2 C41RPACE Get pain when walk at ordinary pace? Used variable descriptions Focused on identifying Topic (main theme: pain, walking ) Subject of information (i.e., bearer: study subject ) Mapped the topic and SOI concepts to UMLS Metathesaurus 9/5/13 16

17 Variable Descriptions 135,608 variables Phenotype Variable Standardization 9/5/13 17

18 Phenotype Variable Standardization Variable Descriptions 77 age mom diagnosed stroke (tia) 135,608 variables 9/5/13 18

19 Variable Descriptions 135,608 variables Phenotype Variable Standardization 77 age mom diagnosed stroke (tia) Normalization Spell out abbreviations and short hand expressions Drop question numbers and other unimportant characters age mother diagnosed stroke (tia) 9/5/13 19

20 Variable Descriptions 135,608 variables Phenotype Variable Standardization 77 age mom diagnosed stroke (tia) Normalization Spell out abbreviations and short hand expressions Drop question numbers and other unimportant characters age mother diagnosed stroke (tia) MetaMap Processing Generate CUIs, concept names, semantic types C : age [organism attribute], C : Mother [family group] C : Stroke [disease or syndrome] 9/5/13 20

21 Variable Descriptions 135,608 variables Phenotype Variable Standardization 77 age mom diagnosed stroke (tia) Normalization Spell out abbreviations and short hand expressions Drop question numbers and other unimportant characters age mother diagnosed stroke (tia) MetaMap Processing Generate CUIs, concept names, semantic types C : age [organism attribute], C : Mother [family group] C : Stroke [disease or syndrome] Semantic Role Assignment Semantic types and keywordbased role identification Evaluation from random sample of 500: 73% accuracy C : age, C : Stroke topic C : Mother subject of information 9/5/13 21

22 Variable Descriptions 135,608 variables Phenotype Variable Standardization 77 age mom diagnosed stroke (tia) Normalization Spell out abbreviations and short hand expressions Drop question numbers and other unimportant characters age mother diagnosed stroke (tia) MetaMap Processing Generate CUIs, concept names, semantic types C : age [organism attribute], C : Mother [family group] C : Stroke [disease or syndrome] Semantic Role Assignment Semantic types and keywordbased role identification Evaluation from random sample of 500: 73% accuracy C : age, C : Stroke topic C : Mother subject of information Variable Categorization Semantic types and keyword-based categorization Evaluation from random sample of 500: 71% accuracy family history, demographics 9/5/13 22

23 Category Examples Variable Descriptions Gender of the participant Last known smoking status Topics Cigarettes/day, exam 1 smoking, medical examination Age in years at uric acid measurement Subject of Information Variable Categories gender study subject Demographics smoking study subject Smoking History age, uric acid measurement study subject study subject Smoking History Healthcare Activity Finding Demographics Lab Tests AGE of living mother age mother Demographics - Family Age at dementia onset as defined by the DSM IV definition age, dementia study subject Demographics Medical History 9/5/13 23

24 Variable Descriptions 135,608 variables Phenotype Variable Standardization 77 age mom diagnosed stroke (tia) Normalization Spell out abbreviations and short hand expressions Drop question numbers and other unimportant characters age mother diagnosed stroke (tia) MetaMap Processing Generate CUIs, concept names, semantic types C : age [organism attribute], C : Mother [family group] C : Stroke [disease or syndrome] Semantic Role Assignment Semantic types and keywordbased role identification Evaluation from random sample of 500: 73% accuracy C : age, C : Stroke topic C : Mother subject of information Variable Categorization Semantic types and keywordbased categorization Evaluation from random sample of 500: 71% accuracy family history, demographics in progress Identification of Similar Variables Same CUI, similar keywords, and same category 9/5/13 24

25 Study Level Metadata Annotation Manual annotation of 422 studies (07/31/13) Metadata items generated Disease topics (encoded with UMLS) Geographical information (encoded with ISO subdivision code: state and country) IRB approval (required or not) Consent type (not restricted, restricted, unspecified) Sample demographics (race and/or ethnicity, gender, age) 9/5/13 25

26 System Development: Integration 9/5/13 26

27 PhenDisco: Put-it-all-together NLP tools + MetaMap dbgap Free text Query parser Information Model Mapping sdgap Ranked studies BM25 ranking algorithm Relevant studies 9/5/13 27

28 System Development: Query Parser 9/5/13 28

29 Contextual Query Language Query types: Simple queries: keywords, phrases. Using Boolean logic: AND, OR, NOT Can process index values, e.g., age > 40 Build a language guideline: BNF form 9/5/13 29

30 BNF form cqlquery ::= prefixassignment cqlquery scopedclause prefixassignment ::= '>' prefix '=' uri '>' uri scopedclause ::= scopedclause booleangroup searchclause searchclause booleangroup ::= boolean [modifierlist] boolean ::= 'and' 'or' 'not' 'prox' searchclause ::= '(' cqlquery ') index relation searchterm searchterm relation ::= comparitor [modifierlist] comparitor ::= comparitorsymbol namedcomparitor comparitorsymbol ::= '=' '>' '<' '>=' '<=' '<>' '==' namedcomparitor ::= identifier modifierlist ::= modifierlist modifier modifier modifier ::= '/' modifiername [comparitorsymbol modifiervalue] prefix, uri, modifiername, modifiervalue, searchterm, index ::= term term ::= identifier 'and' 'or' 'not' 'prox' 'sortby' identifier ::= charstring1 charstring2 9/5/13 30

31 System Development: Study Ranking 9/5/13 31

32 BM25 ranking algorithm N: total number of studies. n t number of studies contains the term t c field in study d w c boost factor for each field c Tf term frequency Idf inverted document frequency 9/5/13 32

33 Technical Infrastructure URL: Linux machine: Ubuntu 64 bits Memory: 32GB RAM Database: MySQL Apache Web server Programming languages: PHP, Python, JavaScripts Python toolkits: pyparsing, Whoosh 9/5/13 33

34 System Demonstra-on 9/5/13 34

35 System Evaluation Search Accuracy User Interface 9/5/13 35

36 Evaluation on Basic Search Basic Search dbgap PhenDisco Recall Precision Recall Precision COPD 100 % 41.67% 80.00% 100 % macular degeneration AND white 100 % 42.86% 100 % 85.71% breast cancer AND breast density (as of July 7, 2013) 100 % 66.67% 50.00% 100 % schizophrenia 100 % 46.88% 86.67% 92.86% cardiomyopathy 100 % 35.00% 100 % 100 % Average 100 % 46.61% 83.33% 95.71% Average F-measure /5/13 36

37 Evaluation on Advanced Search Advanced Search in PhenDisco Recall Precision macular degeneration AND white AND [whole genome genotyping] breast cancer AND breast density AND [IRB not required] AND [whole genome genotyping] 100 % 66.67% 100 % 100 % schizophrenia AND [female] AND [AFFY_6.0] 100 % 100 % cardiomyopathy AND [copy number variant analysis] (as of July 7, 2013) 100 % 100 % Average 100 % % Average F-measure /5/13 37

38 Feedback on the User Interface (N=6) 9/5/13 38

39 Trainees Post-doctoral trainees Ko-Wei Lin, DVM, PhD (Study Abstraction, Standardization, Evaluation) Mindy Ross, MD, MBA (Study Abstraction, Ontology Building) Neda Alipanah, PhD (Ontology Building) Xiaoqian Jiang, PhD (Ranking Algorithm) Mike Conway, PhD (Study Abstraction) Undergraduate trainees Alexander Hsieh (Standardization) Vinay Venkatesh (System Development) Rafael Talavera (Evaluation) Karen Truong (Study Abstraction) Asher Garland (System Development) 9/5/13

40 Acknowledgements Lucila Ohno-Machado (PI) Collaborator Hua Xu Other contribution Jihoon Kim Wendy Chapman Melissa Tharp Staff Stephanie Feudjio Feupe, MS Seena Farzaneh, MS Rebecca Walker, BS Funding: UH2HL from NHLBI, NIH 9/5/13 40

41 Questions? Project Homepage: PhenDisco: Contact:

Core Technology Development Team Meeting

Core Technology Development Team Meeting Core Technology Development Team Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD School of Biomedical Informatics The University of Texas Health Science Center at Houston 1 Advancing Cancer Pharmacoepidemiology

More information

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains Dr. Sascha Losko, Dr. Karsten Wenger, Dr. Wenzel Kalus, Dr. Andrea Ramge, Dr. Jens Wiehler,

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1 A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1 1 Department of Biomedical Informatics, Columbia University, New York, NY, USA 2 Department

More information

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text Anthony Nguyen 1, Michael Lawley 1, David Hansen 1, Shoni Colquist 2 1 The Australian e-health Research Centre, CSIRO ICT

More information

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor Text mining for lung cancer cases over large patient admission data David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor Opportunities for Biomedical Informatics Increasing roll-out

More information

A Method for Analyzing Commonalities in Clinical Trial Target Populations

A Method for Analyzing Commonalities in Clinical Trial Target Populations A Method for Analyzing Commonalities in Clinical Trial Target Populations Zhe (Henry) He 1, Simona Carini 2, Tianyong Hao 1, Ida Sim 2, and Chunhua Weng 1 1 Department of Biomedical Informatics, Columbia

More information

Data mining with Ensembl Biomart. Stéphanie Le Gras

Data mining with Ensembl Biomart. Stéphanie Le Gras Data mining with Ensembl Biomart Stéphanie Le Gras (slegras@igbmc.fr) Guidelines Genome data Genome browsers Getting access to genomic data: Ensembl/BioMart 2 Genome Sequencing Example: Human genome 2000:

More information

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics How can Natural Language Processing help MedDRA coding? April 16 2018 Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics Summary About NLP and NLP in life sciences Uses of NLP with MedDRA

More information

The PhenX Toolkit: Standard Measures for Collaborative Research

The PhenX Toolkit: Standard Measures for Collaborative Research The PhenX Toolkit: Standard Measures for Collaborative Research Clinical Common Data Elements Task Force (CDETF) November 4, 2016 Carol M. Hamilton, PhD RTI International is a trade name of Research Triangle

More information

Multi-modal Patient Cohort Identification from EEG Report and Signal Data

Multi-modal Patient Cohort Identification from EEG Report and Signal Data Multi-modal Patient Cohort Identification from EEG Report and Signal Data Travis R. Goodwin and Sanda M. Harabagiu The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu

More information

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us SIIM 2016 Scientific Session Quality and Safety Part 1 Thursday, June 30 8:00 am 9:30 am Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us Linda C. Kelahan, MD, Medstar

More information

Assessing Health Disparities and Closing the Gap

Assessing Health Disparities and Closing the Gap Assessing Health Disparities and Closing the Gap.an overview Florida Department of Health Objectives Understand basic use of FLHealthCHARTS Understand the concept of social determinants of health Understand

More information

Releasing SNP Data and GWAS Results with Guaranteed Privacy Protection

Releasing SNP Data and GWAS Results with Guaranteed Privacy Protection integrating Data for Analysis, Anonymization, and SHaring Releasing SNP Data and GWAS Results with Guaranteed Privacy Protection Xiaoqian Jiang, PhD and Shuang Wang, PhD Overview Introduction idash healthcare

More information

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods D. Weissenbacher 1, A. Sarker 2, T. Tahsin 1, G. Gonzalez 2 and M. Scotch 1

More information

Introduction to the Partners Biobank Portal. December 2016

Introduction to the Partners Biobank Portal. December 2016 Introduction to the Partners Biobank Portal December 2016 Agenda About the Partners Biobank About the Biobank Portal Data Types in the Biobank Sample types DNA. Plasma and Serum Consent Genomic Data Health

More information

SEQUENCE FEATURE VARIANT TYPES

SEQUENCE FEATURE VARIANT TYPES SEQUENCE FEATURE VARIANT TYPES DEFINITION OF SFVT: The Sequence Feature Variant Type (SFVT) component in IRD (http://www.fludb.org) is a relatively novel approach that delineates specific regions, called

More information

UMLS and phenotype coding

UMLS and phenotype coding One Medicine One Pathology: 2 nd annual CASIMIR Symposium on Human and Mouse Disease Informatics UMLS and phenotype coding Anita Burgun, Fleur Mougin, Olivier Bodenreider INSERM U936, EA 3888- Faculté

More information

Predictive Analytics for Retention in HIV Care

Predictive Analytics for Retention in HIV Care Predictive Analytics for Retention in HIV Care Jessica Ridgway, MD, MS; Arthi Ramachandran, PhD; Hannes Koenig, MS; Avishek Kumar, PhD; Joseph Walsh, PhD; Christina Sung; Rayid Ghani, MS; John A. Schneider,

More information

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track Christopher Wing and Hui Yang Department of Computer Science, Georgetown University,

More information

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS Semantic Alignment between ICD-11 and SNOMED-CT By Marcie Wright RHIA, CHDA, CCS World Health Organization (WHO) owns and publishes the International Classification of Diseases (ICD) WHO was entrusted

More information

IBM Research Report. Automated Problem List Generation from Electronic Medical Records in IBM Watson

IBM Research Report. Automated Problem List Generation from Electronic Medical Records in IBM Watson RC25496 (WAT1409-068) September 24, 2014 Computer Science IBM Research Report Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei Tsou IBM Research

More information

The Hospital Anxiety and Depression Scale Guidance and Information

The Hospital Anxiety and Depression Scale Guidance and Information The Hospital Anxiety and Depression Scale Guidance and Information About Testwise Testwise is the powerful online testing platform developed by GL Assessment to host its digital tests. Many of GL Assessment

More information

AudGenDB: a Public, Internet-Based, Audiologic - Otologic - Genetic Database for Pediatric Hearing Research

AudGenDB: a Public, Internet-Based, Audiologic - Otologic - Genetic Database for Pediatric Hearing Research AudGenDB: a Public, Internet-Based, Audiologic - Otologic - Genetic Database for Pediatric Hearing Research John Germiller 1,2, Michael Italia 4, Jeffrey Pennington 4, Byron Ruth 4, Peter White 4,5, Joy

More information

Web Feature Services Tutorial

Web Feature Services Tutorial Southeast Alaska GIS Library Web Feature Services Tutorial Prepared By Mike Plivelich Version 0.2 Status Draft Updates Continual Release Date June 2010 1 TABLE OF CONTENTS Page # INTRODUCTION...3 PURPOSE:...

More information

Schema-Driven Relationship Extraction from Unstructured Text

Schema-Driven Relationship Extraction from Unstructured Text Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 2007 Schema-Driven Relationship Extraction from Unstructured Text Cartic

More information

Biomedical resources for text mining

Biomedical resources for text mining August 30, 2005 Workshop Terminologies and ontologies in biomedicine: Can text mining help? Biomedical resources for text mining Olivier Bodenreider Lister Hill National Center for Biomedical Communications

More information

A framework for the study of diseases and adverse drug reactions

A framework for the study of diseases and adverse drug reactions A framework for the study of diseases and adverse drug reactions Laura I. Furlong IBI group Research Programme on Biomedical Informatics (GRIB) Hospital del Mar Research Institute (IMIM) Information on

More information

Finding subtle mutations with the Shannon human mrna splicing pipeline

Finding subtle mutations with the Shannon human mrna splicing pipeline Finding subtle mutations with the Shannon human mrna splicing pipeline Presentation at the CLC bio Medical Genomics Workshop American Society of Human Genetics Annual Meeting November 9, 2012 Peter K Rogan

More information

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations December 2014 Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments Eric James Klosterman University

More information

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems Danchen Zhang 1, Daqing He 1, Sanqiang Zhao 1, Lei Li 1 School of Information Sciences, University of Pittsburgh, USA

More information

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts Erik M. van Mulligen, Zubair Afzal, Saber A. Akhondi, Dang Vo, and Jan A. Kors Department of Medical Informatics, Erasmus

More information

Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and

Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and Pilot Study: Clinical Trial Task Ontology Development Introduction A prototype ontology of common participant-oriented clinical research tasks and events was developed using a multi-step process as summarized

More information

Integrated Analysis of Copy Number and Gene Expression

Integrated Analysis of Copy Number and Gene Expression Integrated Analysis of Copy Number and Gene Expression Nexus Copy Number provides user-friendly interface and functionalities to integrate copy number analysis with gene expression results for the purpose

More information

Using an Integrated Ontology and Information Model for Querying and Reasoning about Phenotypes: The Case of Autism

Using an Integrated Ontology and Information Model for Querying and Reasoning about Phenotypes: The Case of Autism Using an Integrated Ontology and Information Model for Querying and Reasoning about Phenotypes: The Case of Autism Samson W. Tu, MS, Lakshika Tennakoon, RMP, MSC, MPhil, Martin O'Connor, MS, Ravi Shankar,

More information

Information Retrieval from Electronic Health Records for Patient Cohort Discovery

Information Retrieval from Electronic Health Records for Patient Cohort Discovery Information Retrieval from Electronic Health Records for Patient Cohort Discovery References William Hersh, MD Professor and Chair Department of Medical Informatics & Clinical Epidemiology Oregon Health

More information

Computer Models for Medical Diagnosis and Prognostication

Computer Models for Medical Diagnosis and Prognostication Computer Models for Medical Diagnosis and Prognostication Lucila Ohno-Machado, MD, PhD Division of Biomedical Informatics Clinical pattern recognition and predictive models Evaluation of binary classifiers

More information

Chapter 12 Conclusions and Outlook

Chapter 12 Conclusions and Outlook Chapter 12 Conclusions and Outlook In this book research in clinical text mining from the early days in 1970 up to now (2017) has been compiled. This book provided information on paper based patient record

More information

The NIMH Data Repositories

The NIMH Data Repositories The NIMH Data Repositories November 5, 2014 Greg Farber, Ph.D. Director Office of Technology Development and Coordination National Institute of Mental Health National Institutes of Health 1 Expansion to

More information

Asthma Surveillance Using Social Media Data

Asthma Surveillance Using Social Media Data Asthma Surveillance Using Social Media Data Wenli Zhang 1, Sudha Ram 1, Mark Burkart 2, Max Williams 2, and Yolande Pengetnze 2 University of Arizona 1, PCCI-Parkland Center for Clinical Innovation 2 {wenlizhang,

More information

Analysis with SureCall 2.1

Analysis with SureCall 2.1 Analysis with SureCall 2.1 Danielle Fletcher Field Application Scientist July 2014 1 Stages of NGS Analysis Primary analysis, base calling Control Software FASTQ file reads + quality 2 Stages of NGS Analysis

More information

The Origins and Promise of PROMIS Patient Reported Outcomes Measurement Information System

The Origins and Promise of PROMIS Patient Reported Outcomes Measurement Information System The Origins and Promise of PROMIS Patient Reported Outcomes Measurement Information System Gene A. Kallenberg, MD Professor and Chief, Division of Family Medicine Director, UCSD Center for Integrative

More information

Semantic Interoperable Electronic Patient Records: The Unfolding of Consensus based Archetypes

Semantic Interoperable Electronic Patient Records: The Unfolding of Consensus based Archetypes 170 Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed

More information

A Web based Computer aided Diagnosis Tool for Bone Age Assessment:

A Web based Computer aided Diagnosis Tool for Bone Age Assessment: Society of Pediatric Radiology 2009 A Web based Computer aided Diagnosis Tool for Bone Age Assessment: Clinical Implementation and Lessons Learned K Ma*; P Moin, MD*; M Fleshman*; L Vachon, MD**; A Zhang,

More information

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation 1,2,3 EMR and Intelligent Expert System Engineering Research Center of

More information

Advancing methods to develop behaviour change interventions: A Scoping Review of relevant ontologies

Advancing methods to develop behaviour change interventions: A Scoping Review of relevant ontologies Advancing methods to develop behaviour change interventions: A Scoping Review of relevant ontologies Participating organisations Emma Norris @EJ_Norris Ailbhe Finnerty Janna Hastings Gillian Stokes Susan

More information

Cochrane Breast Cancer Group

Cochrane Breast Cancer Group Cochrane Breast Cancer Group Version and date: V3.2, September 2013 Intervention Cochrane Protocol checklist for authors This checklist is designed to help you (the authors) complete your Cochrane Protocol.

More information

Overview of the Synthetic Derivative

Overview of the Synthetic Derivative Overview of the Synthetic Derivative April 22, 2009 Melissa Basford, MBA Program Manager, BioVU and Synthetic Derivative What is BioVU? A biobank intended to support a broad view of biology Currently contains

More information

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Ryo Izawa, Naoki Motohashi, and Tomohiro Takagi Department of Computer Science Meiji University 1-1-1 Higashimita,

More information

Simplifying Treatment Protocol Development with.. By Healthy at Work and SaluGenecists

Simplifying Treatment Protocol Development with.. By Healthy at Work and SaluGenecists Simplifying Treatment Protocol Development with.. By Healthy at Work and SaluGenecists Artificial Intelligence for Healthcare: Why the Functional Medicine Model is Much Better. There is increasing interest

More information

Automatically extracting, ranking and visually summarizing the treatments for a disease

Automatically extracting, ranking and visually summarizing the treatments for a disease Automatically extracting, ranking and visually summarizing the treatments for a disease Prakash Reddy Putta, B.Tech 1,2, John J. Dzak III, BS 1, Siddhartha R. Jonnalagadda, PhD 1 1 Division of Health and

More information

11/11/14. Clinical Research Panel. Barriers to multi-site collaborations. Definition: Common Data Elements

11/11/14. Clinical Research Panel. Barriers to multi-site collaborations. Definition: Common Data Elements Barriers to multi-site collaborations A case for common data elements (CDE) Clinical Research Panel 4:40-4.46 Introduction: Who needs CDEs? Winstein 4:47 4:53 CDE databases-what s out there? Plummer Who

More information

The NIH Biosketch. February 2016

The NIH Biosketch. February 2016 The NIH Biosketch February 2016 Outline NIH Biosketch Instructions NCBI My Bibliography Tools for creating a biosketch Word Template NCBI SciENcv Rules NIH Biosketch Questions are Welcome... Unanswerable

More information

Identifying Novel Targets for Non-Small Cell Lung Cancer Just How Novel Are They?

Identifying Novel Targets for Non-Small Cell Lung Cancer Just How Novel Are They? Identifying Novel Targets for Non-Small Cell Lung Cancer Just How Novel Are They? Dubovenko Alexey Discovery Product Manager Sonia Novikova Solution Scientist September 2018 2 Non-Small Cell Lung Cancer

More information

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach Malcolm Pradhan, CMO MBBS, PhD, FACHI Daniel Padilla, ML Engineer BEng,, PhD Alcidion Corporation Overview Alcidion s Natural

More information

Chapter 9. Tests, Procedures, and Diagnosis Codes The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 9. Tests, Procedures, and Diagnosis Codes The McGraw-Hill Companies, Inc. All rights reserved. Chapter 9 Tests, Procedures, and Diagnosis Codes Chapter 9 Content: Overview Ordering A Test SpringLabsTM & Reference Lab Results Managing and Charting Tests Creating A New Test Documenting and Activating

More information

Bioinformatics Laboratory Exercise

Bioinformatics Laboratory Exercise Bioinformatics Laboratory Exercise Biology is in the midst of the genomics revolution, the application of robotic technology to generate huge amounts of molecular biology data. Genomics has led to an explosion

More information

Clinical Genome Knowledge Base and Linked Data technologies. Aleksandar Milosavljevic

Clinical Genome Knowledge Base and Linked Data technologies. Aleksandar Milosavljevic Clinical Genome Knowledge Base and Linked Data technologies Aleksandar Milosavljevic Topics 1. ClinGen Resource project 2. Building the Clinical Genome Knowledge Base 3. Linked Data technologies 4. Using

More information

National Library of Medicine: Overview of Electronic Resources

National Library of Medicine: Overview of Electronic Resources National Library of Medicine: Overview of Electronic Resources Presented at the UNCFSP/HBCU-NLM Access Project ehealth Conference Bethesda, Maryland June 10, 2008 Nicole Dancy and Shannon Baldwin National

More information

Health informatics Digital imaging and communication in medicine (DICOM) including workflow and data management

Health informatics Digital imaging and communication in medicine (DICOM) including workflow and data management INTERNATIONAL STANDARD ISO 12052 Second edition 2017-08 Health informatics Digital imaging and communication in medicine (DICOM) including workflow and data management Informatique de santé Imagerie numérique

More information

FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy. Siddharth Patwardhan. August 2003

FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy. Siddharth Patwardhan. August 2003 FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy by Siddharth Patwardhan August 2003 A report describing the research work carried out at the Mayo Clinic in Rochester as part of an

More information

cagrid, cabig, CVRG and NCIBI Joel Saltz MD, PhD Director Center for Comprehensive Informatics

cagrid, cabig, CVRG and NCIBI Joel Saltz MD, PhD Director Center for Comprehensive Informatics cagrid, cabig, CVRG and NCIBI Joel Saltz MD, PhD Director Center for Comprehensive Informatics Biomedical Middleware: cagrid cagrid Components Security (GAARDS) Language (metadata, ontologies) Semantic/Federated

More information

Real-time Summarization Track

Real-time Summarization Track Track Jaime Arguello jarguell@email.unc.edu February 6, 2017 Goal: developing systems that can monitor a data stream (e.g., tweets) and push content that is relevant, novel (with respect to previous pushes),

More information

National Academies Next Generation SAMPLE Researchers TITLE Initiative HERE

National Academies Next Generation SAMPLE Researchers TITLE Initiative HERE National Academies Next Generation SAMPLE Researchers TITLE Initiative HERE Dennis A. Dean, II, PhD Sanofi Auditorium July 13, 2017 sevenbridges.com A little about me Research Experience Analytics and

More information

Sim TwentyFive: An Interactive Visualization System for Data-Driven Decision Support

Sim TwentyFive: An Interactive Visualization System for Data-Driven Decision Support Sim TwentyFive: An Interactive Visualization System for Data-Driven Decision Support David Kale University of Southern California Virtual PICU Children s Hospital LA dkale@chla.usc.edu Brendan Stubbs Department

More information

What Smokers Who Switched to Vapor Products Tell Us About Themselves. Presented by Julie Woessner, J.D. CASAA National Policy Director

What Smokers Who Switched to Vapor Products Tell Us About Themselves. Presented by Julie Woessner, J.D. CASAA National Policy Director What Smokers Who Switched to Vapor Products Tell Us About Themselves Presented by Julie Woessner, J.D. CASAA National Policy Director The CASAA Consumer Testimonials Database Collection began in 2013 through

More information

EBP ASKING. Constructing a Good Clinical Question Using the PICO Format

EBP ASKING. Constructing a Good Clinical Question Using the PICO Format EBP ASKING Constructing a Good Clinical Question Using the PICO Format Objectives: To demonstrate understanding of a good clinical question. To distinguish between a background question, usually answerable

More information

Guidelines for Effective Usage of Text Highlighting Techniques

Guidelines for Effective Usage of Text Highlighting Techniques Guidelines for Effective Usage of Text Highlighting Techniques Hendrik Strobelt, Daniela Oelke, Bum Chul Kwon, Tobias Schreck, Hanspeter Pfister presented by Jordon Johnson 1 Many text vis tools http://textvis.lnu.se/

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY

KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY 1 Bridget McInnes Ted Pedersen Ying Liu Genevieve B. Melton Serguei Pakhomov

More information

DPV. Ramona Ranz, Andreas Hungele, Prof. Reinhard Holl

DPV. Ramona Ranz, Andreas Hungele, Prof. Reinhard Holl DPV Ramona Ranz, Andreas Hungele, Prof. Reinhard Holl Contents Possible use of DPV Languages Patient data Search for patients Patient s info Save data Mandatory fields Diabetes subtypes ICD 10 Fuzzy date

More information

Application of AI in Healthcare. Alistair Erskine MD MBA Chief Informatics Officer

Application of AI in Healthcare. Alistair Erskine MD MBA Chief Informatics Officer Application of AI in Healthcare Alistair Erskine MD MBA Chief Informatics Officer 1 Overview Why AI in Healthcare topic matters Is AI just another shiny objects? Geisinger AI collaborations Categories

More information

How Big Data and Advanced Analytics Can Improve Population Health: Now and In the Near Future

How Big Data and Advanced Analytics Can Improve Population Health: Now and In the Near Future How Big Data and Advanced Analytics Can Improve Population Health: Now and In the Near Future William J. Kassler, MD, MPH Deputy Chief Health Officer Lead Population Health Officer Precision Medicine Completion

More information

HHS Public Access Author manuscript Hum Mutat. Author manuscript; available in PMC 2016 April 16.

HHS Public Access Author manuscript Hum Mutat. Author manuscript; available in PMC 2016 April 16. GeneMatcher: A Matching Tool for Connecting Investigators with an Interest in the Same Gene Nara Sobreira 1,*, François Schiettecatte 2, David Valle 1, and Ada Hamosh 1 1 Institute of Genetic Medicine,

More information

Building Cognitive Computing for Healthcare

Building Cognitive Computing for Healthcare Building Cognitive Computing for Healthcare Boston University Digital Health Initiative Round Table February 27 th 2017 Patrick McNeillie M.D. Clinical Lead and Senior Architect on Watson for Genomics

More information

Delphi Survey Results. MPIs: Drs. William Dale, Arti Hurria, Supriya Mohile

Delphi Survey Results. MPIs: Drs. William Dale, Arti Hurria, Supriya Mohile Delphi Survey Results MPIs: Drs. William Dale, Arti Hurria, Supriya Mohile Cancer and Aging Research Group (CARG) A Delphi Investigation Of Geriatric Oncology Experts Sustainable Infrastructure That Supports

More information

Guide to Use of SimulConsult s Phenome Software

Guide to Use of SimulConsult s Phenome Software Guide to Use of SimulConsult s Phenome Software Page 1 of 52 Table of contents Welcome!... 4 Introduction to a few SimulConsult conventions... 5 Colors and their meaning... 5 Contextual links... 5 Contextual

More information

The Impact of Belief Values on the Identification of Patient Cohorts

The Impact of Belief Values on the Identification of Patient Cohorts The Impact of Belief Values on the Identification of Patient Cohorts Travis Goodwin, Sanda M. Harabagiu Human Language Technology Research Institute University of Texas at Dallas Richardson TX, 75080 {travis,sanda}@hlt.utdallas.edu

More information

Electronic Care Centre for Caregivers of Patients with Alzheimer's disease

Electronic Care Centre for Caregivers of Patients with Alzheimer's disease Electronic Care Centre for Caregivers of Patients with Alzheimer's disease Hippokratis Apostolidis1, Thrasyvoulos Tsiatsos1, Konstantina Karagkiozi2, Tatiana Dimitriou3, Magdalini Tsolaki3 1 Department

More information

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Big Data Phenomics in the VA. Outline

Big Data Phenomics in the VA. Outline Big Phenomics in the VA Mary Whooley MD Director, VA Measurement Science QUERI San Francisco VA Health Care System University of California, San Francisco Kelly Cho PhD MPH Phenomics Lead, Million Veteran

More information

SAGE. Nick Beard Vice President, IDX Systems Corp.

SAGE. Nick Beard Vice President, IDX Systems Corp. SAGE Nick Beard Vice President, IDX Systems Corp. Sharable Active Guideline Environment An R&D consortium to develop the technology infrastructure to enable computable clinical guidelines, that will be

More information

Genome. Institute. GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud. R. Jay Mashl.

Genome. Institute. GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud. R. Jay Mashl. GenomeVIP: the Genome Institute at Washington University A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud R. Jay Mashl October 20, 2014 Turnkey Variant

More information

A review of approaches to identifying patient phenotype cohorts using electronic health records

A review of approaches to identifying patient phenotype cohorts using electronic health records A review of approaches to identifying patient phenotype cohorts using electronic health records Shivade, Raghavan, Fosler-Lussier, Embi, Elhadad, Johnson, Lai Chaitanya Shivade JAMIA Journal Club March

More information

Global infectious disease surveillance through automated multi lingual georeferencing of Internet media reports

Global infectious disease surveillance through automated multi lingual georeferencing of Internet media reports Global infectious disease surveillance through automated multi lingual georeferencing of Internet media reports John S. Brownstein, PhD Harvard Medical School Children s Hospital Informatics Program Harvard

More information

Combining Archetypes with Fast Health Interoperability Resources in Future-proof Health Information Systems

Combining Archetypes with Fast Health Interoperability Resources in Future-proof Health Information Systems 180 Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed

More information

EMBASE Find quick, relevant answers to your biomedical questions

EMBASE Find quick, relevant answers to your biomedical questions EMBASE Find quick, relevant answers to your biomedical questions Piotr Golkiewicz Solution Sales Manager Life Sciences Central and Eastern Europe and Russia Embase is a registered trademark of Elsevier

More information

TeamHCMUS: Analysis of Clinical Text

TeamHCMUS: Analysis of Clinical Text TeamHCMUS: Analysis of Clinical Text Nghia Huynh Faculty of Information Technology University of Science, Ho Chi Minh City, Vietnam huynhnghiavn@gmail.com Quoc Ho Faculty of Information Technology University

More information

PROPOSED WORK PROGRAMME FOR THE CLEARING-HOUSE MECHANISM IN SUPPORT OF THE STRATEGIC PLAN FOR BIODIVERSITY Note by the Executive Secretary

PROPOSED WORK PROGRAMME FOR THE CLEARING-HOUSE MECHANISM IN SUPPORT OF THE STRATEGIC PLAN FOR BIODIVERSITY Note by the Executive Secretary CBD Distr. GENERAL UNEP/CBD/COP/11/31 30 July 2012 ORIGINAL: ENGLISH CONFERENCE OF THE PARTIES TO THE CONVENTION ON BIOLOGICAL DIVERSITY Eleventh meeting Hyderabad, India, 8 19 October 2012 Item 3.2 of

More information

Bellagio, Las Vegas November 26-28, Patricia Davis Computer-assisted Coding Blazing a Trail to ICD 10

Bellagio, Las Vegas November 26-28, Patricia Davis Computer-assisted Coding Blazing a Trail to ICD 10 Bellagio, Las Vegas November 26-28, 2012 Patricia Davis Computer-assisted Coding Blazing a Trail to ICD 10 2 Publisher s Notice Although we have tried to include accurate and comprehensive information

More information

Analyzing the Semantics of Patient Data to Rank Records of Literature Retrieval

Analyzing the Semantics of Patient Data to Rank Records of Literature Retrieval Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain, Philadelphia, July 2002, pp. 69-76. Association for Computational Linguistics. Analyzing the Semantics of Patient Data

More information

UCLA at TREC 2014 Clinical Decision Support Track: Exploring Language Models, Query Expansion, and Boosting

UCLA at TREC 2014 Clinical Decision Support Track: Exploring Language Models, Query Expansion, and Boosting UCLA at TREC 2014 Clinical Decision Support Track: Exploring Language Models, Query Expansion, and Boosting Jean I. Garcia-Gathright a, Frank Meng a,b, William Hsu a,b University of California, Los Angeles

More information

An unsupervised machine learning model for discovering latent infectious diseases using social media data

An unsupervised machine learning model for discovering latent infectious diseases using social media data Journal of Biomedical Informatics j o u r n al homepage: www.elsevier.com/locate/yj b i n An unsupervised machine learning model for discovering latent infectious diseases using social media data ARTICLE

More information

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer Cancer Gene Panels Dr. Andreas Scherer Dr. Andreas Scherer President and CEO Golden Helix, Inc. scherer@goldenhelix.com Twitter: andreasscherer About Golden Helix - Founded in 1998 - Main outside investor:

More information

SFARI Gene 2.0 User Guide

SFARI Gene 2.0 User Guide 1 SFARI Gene 2.0 User Guide This document is designed to acquaint the new user with SFARI Gene release 4.0, an integrated resource for autism research, and to provide enough information to allow the user

More information

A Web Tool for Building Parallel Corpora of Spoken and Sign Languages

A Web Tool for Building Parallel Corpora of Spoken and Sign Languages A Web Tool for Building Parallel Corpora of Spoken and Sign Languages ALEX MALMANN BECKER FÁBIO NATANAEL KEPLER SARA CANDEIAS July 19,2016 Authors Software Engineer Master's degree by UFSCar CEO at Porthal

More information

SciENcv: alpha/beta and beyond. Bart Trawick, PhD National Center for Biotechnology Information National Library of Medicine

SciENcv: alpha/beta and beyond. Bart Trawick, PhD National Center for Biotechnology Information National Library of Medicine SciENcv: alpha/beta and beyond Bart Trawick, PhD National Center for Biotechnology Information National Library of Medicine SciENcv Overview SciENcv = Science Experts Network Curriculum Vitae SciENcv interagency

More information

!"#$%&'!(!%&# !"#$%"&'(") *"+,-. /0##"%120 /02&3"$45 64#10 '475"#0 8919": ;"2"<91

!#$%&'!(!%&# !#$%&'() *+,-. /0##%120 /02&3$45 64#10 '475#0 8919: ;2<91 !"#$%&'!(!%&#!"#$%"&'(") *"+,-. /0##"%120 /02&3"$45 64#10 '475"#0 8919": ;"2"5.?@ Paloma Díaz Narjès Bellamine Ben Saoud Julie Dugdale Chihab Hanachi (Eds.) ISCRAM-Med 2016 Third International Conference

More information

Toward a Unified Representation of Findings in Clinical Radiology

Toward a Unified Representation of Findings in Clinical Radiology Toward a Unified Representation of Findings in Clinical Radiology Valérie Bertaud a, Jérémy Lasbleiz ab, Fleur Mougin a, Franck Marin a, Anita Burgun a, Régis Duvauferrier ab a EA 3888, LIM, Faculty of

More information