Big Data in Healthcare: motivation, current state and specific use cases

Similar documents
Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Chapter 12 Conclusions and Outlook

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

How Big Data and Advanced Analytics Can Improve Population Health: Now and In the Near Future

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

Clinical decision support (CDS) and Arden Syntax

Social Determinants of Health

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

How to code rare diseases with international terminologies?

THE ANALYTICS EDGE. Intelligence, Happiness, and Health x The Analytics Edge

Patient characteristics associated with venous thromboembolic events: a cohort study using pooled electronic health record data

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD

Application of AI in Healthcare. Alistair Erskine MD MBA Chief Informatics Officer

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Lung Cancer Concept Annotation from Spanish Clinical Narratives

A review of approaches to identifying patient phenotype cohorts using electronic health records

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach

Heiner Oberkampf. DISSERTATION for the degree of Doctor of Natural Sciences (Dr. rer. nat.)

Health informatics Digital imaging and communication in medicine (DICOM) including workflow and data management

An introduction to case finding and outcomes

Big Data Phenomics in the VA. Outline

Building a Diseases Symptoms Ontology for Medical Diagnosis: An Integrative Approach

Master of Science in Management: Fall 2018 and Spring 2019

From Population Health to Precision Health. William J, Kassler, MD, MPH Deputy Chief Health Officer March 28, 2017

Clinical Narrative Analytics Challenges

Biomedical Engineering in Commercial Healthcare IT Industry Startup Company's View. Managing Director, goact Pty Ltd

FDA Workshop NLP to Extract Information from Clinical Text

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

I, Mary M. Langman, Director, Information Issues and Policy, Medical Library Association

NAVIFY Tumor Board NAVIFY

Following the health of half a million participants

We see health care. differently. Comprehensive data Novel insights Transformative actions Lasting value

Clinical Informatics and Clinician Engagement. Martin Sizemore, Chief Data Officer Wake Forest Baptist Health May 3, 2017

SNOMED CT and Orphanet working together

Team-Based Decision Support in Diabetes Outcomes and Costs

KNOWLEDGE VALORIZATION

Using Natural Language Processing To Analyze Electronic Health Records. Philip Poon PhD Data Scientist

Moving Family Health History & Genetic Test Result Data into the Electronic Health Record for Clinical Decision Support

Christina Martin Kazi Russell MED INF 406 INFERENCING Session 8 Group Project November 15, 2014

Artificial Intelligence to Enhance Radiology Image Interpretation

Joint ADA and Health Level 7 Collaboration On Electronic Dental Attachments

CLINICIAN-LED E-HEALTH RECORDS. (AKA GETTING THE LITTLE DATA RIGHT) Dr Heather Leslie Ocean Informatics/openEHR Foundation

Language Technologies and Business in the Future

THE FUTURE OF OR. Dimitris Bertsimas MIT

The Potential of SNOMED CT to Improve Patient Care. Dec 2015

Radiation Oncology. The conf ident path to treatment Philips Radiation Oncology Solutions

Using Predictive Analytics to Save Lives

White Paper. Human Behavior Analytics with Proxy Interrogatories

Personal Health Systems. Loukianos Gatzoulis DG Information Society & Media European Commission

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems

Strategic Approach in Slovakia Vision of health care - approved Mission of ehealth - approved Vision of ehealth - approved Scope of ehealth - approved

QUANTITATIVE IMAGING ANALYTICS

Watson Summit Prague 2017

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

Demystifying Pharmacogenetics: its evolution and challenges. June 15, 2016

Master of Business Administration: Fall 2018/Spring 2019/Fall Semesters (45 Credits)

Background Information

Impressions of a New NCI Director: Big Data

National Academies Next Generation SAMPLE Researchers TITLE Initiative HERE

These Terms Synonym Term Manual Used Relationship

How to Advance Beyond Regular Data with Text Analytics

Surveillance and SEER Where are we going? NAACCR Meeting June 23, 2017 Lynne Penberthy MD, MPH

Opportunities for Technology in the Self-Management of Mental Health

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

Personalized, Evidence-based, Outcome-driven Healthcare Empowered by IBM Cognitive Computing Technologies. Guotong Xie IBM Research - China

Building Cognitive Computing for Healthcare

Get the Right Reimbursement for High Risk Patients

TeamHCMUS: Analysis of Clinical Text

Asthma Surveillance Using Social Media Data

Does This EHR Make Me Look Fat? Lindsey Hoggle, MS, RD, PMP and Nancy Collins, PhD, RD, LD/N, FAPWCA

DATA, TOOLS AND RESOURCES FOR MINING SOCIAL MEDIA DRUG CHATTER

Rare Diseases Nomenclature and classification

Not all NLP is Created Equal:

Enhanced Asthma Management with Mobile Communication

Bellagio, Las Vegas November 26-28, Patricia Davis Computer-assisted Coding Blazing a Trail to ICD 10

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

HOW TO MAXIMIZE PATIENT RECRUITMENT IN ONCOLOGY TRIALS A BIOPHARMA DIVE PLAYBOOK

Virtual Sensors: Transforming the Way We Think About Accommodation Stevens Institute of Technology-Hoboken, New Jersey Katherine Grace August, Avi

How preferred are preferred terms?

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

New Models of Evidence Generation and Cancer Care Delivery: Distance Medicine Technologies

Patients as consumers of healthcare: the role of innovation in self-care and self-management

Big Data & Predictive Analytics Case Studies: Applying data science to human data Big-Data.AI Summit

Challenges for U.S. Attorneys Offices (USAO) in Opioid Cases

Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms

PHARMO Database Network

Personalized HealthCare and Agent Technologies

1st Turku Traumatic Brain Injury Symposium Turku, Finland, January 2014

A Deep Learning Approach to Identify Diabetes

Customer case study Genome-wide variation Impact of Human Variation on Disease. Sample to Insight

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

Bigomics : Challenges and promises in large scale sequencing projects

Providing Individualised Services and Care in Epilepsy (PISCES) Richard Corbridge CIO of the Health Service

Precision Health Economics, Not Just Genetics

Standardize and Optimize. Trials and Drug Development

PROPOSED WORK PROGRAMME FOR THE CLEARING-HOUSE MECHANISM IN SUPPORT OF THE STRATEGIC PLAN FOR BIODIVERSITY Note by the Executive Secretary

An Edge-Device for Accurate Seizure Detection in the IoT

Healthcare Research You

Transcription:

Big Data in Healthcare: motivation, current state and specific use cases Alejandro Rodríguez González Centro de Tecnología Biomédica Universidad Politécnica de Madrid <alejandro.rg@upm.es>

Who we are? MEDAL laboratory 2

Our expertise What we do now: Data analytics: Expertise in applying data mining to several domains with special emphasis in the biomedical domain. Natural language processing: Retrieving knowledge from medical texts and Electronic Health Records (mainly) but also applied to other domains (for example: legal domain). Image analysis: Automatic analysis and annotation of medical images (mainly) but also focused on geospatial imaging and other fields.

Big Data in health domain It's far more important to know what person the disease has than what disease the person has. Hipocrates 4

Big Data Source: http://www.terem.com.au/blog/big-data-needs-small-data/ 5

Big Data Source: http://bit.ly/2g7vmp8 6

Big Data Source: http://andressilvaa.tumblr.com/post/87206443764/big-data-refers-to-5vs-volume 7

Who generates Big Data? Scientific instruments (generate any kind of data) Mobile devices (continuous device tracking) Social media (we all generate data) Sensor technology and networks (continuous measuring) 8

Who generates Big Data? 9

Who generates Big Data? 10

Big Data in Biomedicine Two main objectives: Genomic-driven data (next generation sequencing, genetic expression, ) Payer-provider (electronic health records, drug prescription, insurance prescription,..) 11

Big Data in Biomedicine Source: http://www.nature.com/scitable/topicpage/genomic-data-resourceschallenges-and-promises-743721# 12

Big Data in Biomedicine Average size of this data vary between (~ 200GB 4TB) for a single individual. With thousand of individuals this data reach size in the order of petabytes. 13

Motivation In 2012, worldwide digital healthcare data was estimated to be equal to 500 petabytes and is expected to reach 25,000 petabytes in 2020. Can we learn from the past to become better in the future? Healthcare Data is becoming more complex: several types of data, unstructured, structured, 14

Motivation The problem: Millions of reports, tasks, incidents, events, images, Availability Lack of protocols and structure Organization oriented processes From information to knowledge 15

Ejemplos Clopidogrel (Plavix) es un fármaco para prevenir coágulos de sangre que puedan causar ataques cardíacos o ictus. Existía preocupación de que otros fármacos (inhibidores de las bombas de protones; fármacos para reducir el ácido gástrico) pudieran interferir con la activación del Clopidogrel. Medco usó su base de datos para buscar diferencias en dos estudios de cohorte: aquellos que usaban uno de los fármacos y aquellos en los que los dos podían interactuar. El estudio reveló que los pacientes que tomaban ambos tenían un porcentaje un 50% mayor de sufrir los episodios que paliaba el Clopidogrel. 16

Ejemplos Otro estudio similar mostró que los antidepresivos bloqueaban la efectividad del tamoxifen, un fármaco usado para prevenir la reaparición del cáncer de mama, de forma que los usuarios que tomaban ambos productos duplicaban las posibilidades de que el cáncer reapareciera. En cualquier caso, estos ejemplos se basan en hipótesis. Uno de los objetivos del análisis de Big Data es poder realizar estudios sin hipótesis dadas. 17

Ejemplos 18

Ejemplos 19

Más ejemplos http://www.journalofbigdata.com/content/1/1/2 20

Use case 1: predictive medicine Provide right intervention to the right patient at the right time. ACQUIRE, PROCESS, ANALYZE UNDERSTAND PREDICT 21

What clinicians aim: evidence based medicine Correlations, associations of symptoms, familiar antecedents, habits, diseases. Impact of certain biomedical factors (genome structure, clinical variables) on the evolution of certain diseases. 22

What clinicians aim: evidence based medicine Automatic classification of images (prioritization of RX images to help diagnosis). Automatic annotation of images. Natural language (google style) based diagnose aid tools. 23

What researchers aim Find early indicators of diseases. Design of clinical trials. Automatic search in bibliography using not only keywords but also analyzing the text of the papers. Use of analytics services available on the web. Use data and services of the cloud for in order to obtain knowledge from of other hospitals/countries/... 24

At what stage do we compute prediction? Healthy Expenses on health care Healthy Low risk At risk High Risk Early symptoms 20% of the population generates 80% of the costs Analyze data so to act as soon as possible - Early detection - Personalized evidence

26

From data to knowledge Data Acquisition Data processing Modelling Validation Apply 27

1 st step: Data acquisition EHR: Structured data: Lab tests (LOINC) Many lab systems still use local dictionaries to encode labs Diverse numeric scales on different labs Missing data Clinical and demographic data (ICD): ICD stands for International Classification of Diseases ICD is a hierarchical terminology of diseases, signs, symptoms, and procedure codes maintained by the World Health Organization (WHO) Pros: Universally available Cons: medium recall and medium precision for characterizing patients Non-structured data: Images Clinical notes 28

Standards MeSH (Medical Subject Headings) - A thesaurus for indexing articles for PubMed. UMLS (Unified Medical Language System) - Integrates key terminology among different coding standards. SNOMED CT - Standard for clinical terminology. DICOM (Digital Imaging and Communications in Medicine) - Standard for processing medical images. 29

Standards GS1 standards - Used to identify uniquely different medical products. LOINC (Logical Observation Identifiers Names and Codes) - Standard for identifying laboratory and clinical observations. RxNORM - Standard normalizing names for pharmacy & drugs products. 30

Text Processing: NLP 31

NLP 32

Some specific challenges Acronyms Entity recognition for diagnosis terms Numbers and metrics 33

Acronyms meaning How to decide the meaning of acronyms: Context dependant Use UMLS Use Machine learning to learn 34

Example of Acronym disambiguation 35

Proposed solution Clinical Notes Learning algorithms Detect and expand Acronyms (context dependant) Enriched Clinical notes Feature Extraction BD Models 36

Comparison of results Precision Recall F1 0.968 0.967 0.966 0.965 0.964 0.963 0.962 0.961 Aproximación 1 Aproximación 2 Aproximación 3 Aproximación 4 37

Diagnosis terms The identification of diagnosis elements in medical texts is a crucial task. Mainly used for the development of medical diagnosis systems. Other relevant uses can be found in the construction of human symptoms disease networks, a challenging area of research where this information is very important. 38

ctakes and MetaMap Compare the accuracy regarding the extraction of generalist medical terms that only affect to terms used in the diagnosis context. The experiment was performed using: our framework in which Apache ctakes is used as NER. We have manually analyzed the results and made a comparison between MetaMap and Apache ctakes.

Comparison of results 40

Numbers and metrics. Treatments: ibuprofen 2 cp/d Laboratory tests glucose 140mg Blood pressure measure 140/92 mmhg. Numbers and metrics Dates: Absolute MRI on14/02/2016 Relative patient suffered from headache two days ago 41

Use case 3: Data Analytics (the real insight) Source: https://akshaykher.wordpress.com/2015/08/18/how-to-start-a-career-inanalytics-for-free-3/ 42

Data Science 43

Numbers and metrics INTERPRETATION AND EVALUATION DATA MINING Knowledge CODIFICATION Models CLEANING Transformed data SELECTION Processed data Data Objective data

Some interesting facts for data science [4] 45

Profile of respondents of the survey 46

Finding1: There s a Still a Shortage of Data Scientists (And it Might Be Getting Worse) 47

Finding 2 Data Scientists Love Their Jobs 48

How a Data Scientist Spends Their Day 49

Why That s a Problem Simply put, data wrangling isn t fun. It takes forever. In fact, a few years back, the New York Times estimated that up to 80% of a data scientist s time is spent doing this sort of work. 50

Do Data Scientists Have What They Need? 51

The Top 10 In-Demand Data Science Skills We looked at nearly 4,000 data science job postings on LinkedIn to find out what skills organizations wanted from their new hires. We ran those job postings through the CrowdFlower platform and had our contributors mark which skills showed up in which jobs. 52

What s Next for Data Science? to put it simply, is machine learning 53

Projects: NDMonitor Integral low-cost platform for the monitoring and help of patients with neurodegenerative diseases in mental capabilities Track patients movements at home. Monitor and analyse their behaviours and actions. Track via GPS when patient leaves home. Allow therapist and families to see reports.

Frailty care and well function Projects: FACET Integration of human phenotypic data. Early detect of frailty. Focus on intervention. Prevent or delay disability. Estimated impact in quality of life of 13.05 million people. Integration with already developed data lakes and services layer by GMV and BULL.

Projects: PAPHOS Platform for advance prescriptive health operational system ICT to support both professionals and patients. Two use cases: medical imaging (cellular mitosis in lung cancer) and data analytics (apnoea improvement with CPAP). Fully-functional platform which includes interoperability capabilities. Security layer to protect the data. Integration with already developed data lakes and services layer by GMV and BULL.

Use case 3: Large-scale HSDN Analysis of disease-networks in a large scale perspective Extraction of phenotypical knowledge from multiple sources (text mining/nlp). Identification and use of well-known biological databases. Creation of large disease-networks. Mapping and analysis of common features for drug repurposing.

Human disease complex networks DIAPOSITIVA 58 Source: [3]

Human disease complex networks DIAPOSITIVA 59

Conclusions Big Data is more than BIG data: not only the size. Unstructured extraction (such as text) implies a huge part of Big Data in ehealth. Analytics is the main point where everything convey. 60

Any question? 61

References 1. Harkema et al. (2009). ConText: An algorithm for determining negation,experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics. Vol, 42(5), pp. 839-851. 2. Goh et al. (2007). The human disease network. Proceedings of the National Academy of Sciences. Vol. 104(21). pp. 8665-8690. 3. Zhou et al. (2014). Human symptoms-disease network. Nature Communications. Vol. 5. 4. https://visit.crowdflower.com/2015-data-scientist-report 62