Using Internet data to learn in the health domain

Similar documents
Predicting Depression via Social Media

Automated Social Network Epidemic Data Collector

Not All Moods are Created Equal! Exploring Human Emotional States in Social Media

CHALLENGES IN INFLUENZA FORECASTING AND OPPORTUNITIES FOR SOCIAL MEDIA MICHAEL J. PAUL, MARK DREDZE, DAVID BRONIATOWSKI

USING SOCIAL MEDIA TO STUDY PUBLIC HEALTH MICHAEL PAUL JOHNS HOPKINS UNIVERSITY MAY 29, 2014

Tom Hilbert. Influenza Detection from Twitter

Discovering and Understanding Self-harm Images in Social Media. Neil O Hare, MFSec Bucharest, Romania, June 6 th, 2017

Lecture 20: CS 5306 / INFO 5306: Crowdsourcing and Human Computation

Monitoring of Epidemic Using Spatial Technology

Jia Jia Tsinghua University 25/01/2018

Jia Jia Tsinghua University 26/09/2017

Sensing, Inference, and Intervention in Support of Mental Health

This is a repository copy of Measuring the effect of public health campaigns on Twitter: the case of World Autism Awareness Day.

Asthma Surveillance Using Social Media Data

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Texting4Health Conference February 29, InSTEDD Proprietary Level I

#Swineflu: Twitter Predicts Swine Flu Outbreak in 2009

City, University of London Institutional Repository

Surveillance of Recent HIV Infections: Using a Pointof-Care Recency Test to Rapidly Detect and Respond to Recent Infections

Obama declares H1N1 national emergency

Global infectious disease surveillance through automated multi lingual georeferencing of Internet media reports

Machine Learning for Population Health and Disease Surveillance

Abstract. But what happens when we apply GIS techniques to health data?

PLM Data Capabilities Overview June 2014

Social networking platforms 4/30/2013. What is Social Media? Learn how Social Media is being applied in healthcare today

H1N1 1. Public anxiety and information seeking following the H1N1 outbreak: Web blogs, newspaper articles and Wikipedia visits

Available online at ScienceDirect. Procedia Computer Science 70 (2015 ) Vinay Kumar Jain a, Shishir Kumar b

Infodemiology, an emerging area of research at the

Use of mobile applications for food and activity logging among runners

arxiv: v1 [cs.cy] 21 May 2017

Towards Real-Time Measurement of Public Epidemic Awareness

Identifying Signs of Depression on Twitter Eugene Tang, Class of 2016 Dobin Prize Submission

NTT DOCOMO Technical Journal

The ALL ABOUT Website Portfolio

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing

Brendan O Connor,

Discussion of Can We Measure Inflation Expectations Using Twitter? by Angelico, Marcucci, Miccoli, and Quarta

Improving Foodborne Complaint and Outbreak Detection Using Social Media, New York City

Analysing Twitter and web queries for flu trend prediction

Predicting Drug Recalls from Internet Search Engine Queries

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University

Understanding and Discovering Deliberate Self-harm Content in Social Media

Epimining: Using Web News for Influenza Surveillance

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us

Working Group on Epidemic Preparedness:

Language Technologies and Business in the Future

How preferred are preferred terms?

CASAAMedia Samantha Henning, Missy Harris & Caroline McCarthy. Social media Engagement & Evaluation

IDENTIFYING STRESS BASED ON COMMUNICATIONS IN SOCIAL NETWORKS

Experimental evidence of massive-scale emotional contagion through social networks

Animal Disease Event Recognition and Classification

Towards Real Time Epidemic Vigilance through Online Social Networks

Towards Integrated Syndromic Surveillance in Europe?

From Once Upon a Time to Happily Ever After: Tracking Emotions in Books and Mail! Saif Mohammad! National Research Council Canada!

New England HIV Implementation Science Network

#ubcpsyc325 PSYC 325 with Dr. Rawn

Classification of Local Language Disaster Related Tweets in Micro Blogs

My Fitness Pal Health & Fitness Tracker A User s Guide

Seeking Social Solace

IRIT at e-risk

Data mining with Ensembl Biomart. Stéphanie Le Gras

Tracking Disease Outbreaks using Geotargeted Social Media and Big Data

Current U.S. Research

PROJECT: APPLYING ADVANCED ANALYTICS TO HELP IMPROVE MENTAL HEALTH AMONG HIV POSITIVE ADOLESCENTS IN SOUTH AFRICA

Supporting Dermatology Patients in the Digital Age. GlobalSkin White Paper January 2018

UniNE at CLEF 2015: Author Profiling

Module. Managing Feelings About. Heart Failure

Introduction to Sentiment Analysis

FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

Pathway Project Team

Access from the University of Nottingham repository:

Heart Disease and Stroke. You re the Cure. You re the Cure A Guide for Advocates

Providing hope, help, support, and education since 1985 to improve the lives of people who have mood disorders.

Data Driven Methods for Disease Forecasting

Using Data from Electronic HIV Case Management Systems to Improve HIV Services in Central Asia

Facts and Fabrications about Ebola: A Twitter Based Study

Analyzing Spammers Social Networks for Fun and Profit

Development of the Web Users Self Efficacy scale (WUSE)

Visual and Decision Informatics (CVDI)

BREWFIT RULE FITNESS THE RIGHT WAY

Elsevier ClinicalKey TM FAQs

Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH)

Emotion Detection on Twitter Data using Knowledge Base Approach

Biomedical Engineering in Commercial Healthcare IT Industry Startup Company's View. Managing Director, goact Pty Ltd

Building a Digital Home to Provide Patient Support, Education and Clinical Trial Awareness

Learning from Online Health Communities. Noémie Elhadad

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems

PROPOSED WORK PROGRAMME FOR THE CLEARING-HOUSE MECHANISM IN SUPPORT OF THE STRATEGIC PLAN FOR BIODIVERSITY Note by the Executive Secretary

Analysis of Anxious Word Usage on Online Health Forums

From web search to healthcare utilization: privacy-sensitive studies from mobile data

Regional Level Influenza Study with Geo-Tagged Twitter Data

Social Network Data Analysis for User Stress Discovery and Recovery

Machine Learning and Event Detection for the Public Good

WALK MS: 2015 PARTICIPANT & TEAM CAPTAIN KIT

Internet Support Communities, Online Peer Support, Social Networks/Social Media & Internet Use by Hospitalized Patients

Best Practice Sharing Session The Colorado Patient Partners in Research network: Reaching Stakeholders through Social Media Kaiser Permanente

CS 528 Mobile and Ubiquitous Computing Lecture 7b: Human-Centric Smartphone Sensing Applications. Emmanuel Agu

Real-time Summarization Track

The Emotion Analysis on the Chinese Comments from News portal and Forums Jiawei Shen1, 2, Wenjun Wang1, 2 and Yueheng Sun1, 2, a

What Happened to Bob? Semantic Data Mining of Context Histories

Transcription:

Using Internet data to learn in the health domain Carla Teixeira Lopes - ctl@fe.up.pt SSIM, MIEIC, 2016/17 Based on slides from Yom-Tov et al. (2015)

Agenda Internet data for health research Data sources Research works

Internet data

Internet data When should we use it for health research? Why is it useful? Any advantages over the data collected in the physical world?

Advantages of Internet data Easier to collect than in the physical world Larger sample More trustworthy than surveys

Easier to collect http://blog.okcupid.com/index.php/the-biggest-lies-in-online-dating/

Easier to collect (Pelleg et al., 2012)

Larger sample

Survey problem On a survey we depend on the quality of the answers.

Associations are hard to predict

Data sources

Data sources Web search General social media: Twitter, Facebook, Flickr Medical social media: ehealthme, PatientsLikeMe Medical Internet aggregators: HealthMap Actively collecting data: crowdsourcing, online advertisements, online surveys Other data: Smartphone interaction, Fitness monitors

Web search http://www.internetlivestats.com/google-search-statistics/

Health web search Searching for health information online is the third most popular activity online (Fox, 2011), being done by 72% of American Internet users (Fox and Duggan, 2013) (Fox and Duggan, 2013)

Health web search http://healthdecide.orcahealth.com/2012/12/10/how-health-consumers-use-the-web-infographic/

Obtaining a search log Company Crowdsourcing Use other datasets

General social media http://healthdecide.orcahealth.com/2012/12/10/how-health-consumers-use-the-web-infographic/

General social media Small scale data is generally available (e.g.: in collated datasets or through crawl) http://anadouglas.com/which-social-media-platform-are-you-on/

Medical social media People gathering to discuss their specific predicament Examples: ehealthme, PatientsLikeMe Truthfulness is usually high Data availability can be a (legal) problem

Medical Internet aggregators: HealthMap

Crowdsourcing

Online advertisements

Online surveys To validate findings

Other data Smartphone interaction Fitness monitors Internet of Things (IoT) http://healthdecide.orcahealth.com/2012/12/10/howhealth-consumers-use-the-web-infographic/

Characteristics of data sources Truthfulness Are people providing real information? Anonymity and usefulness What do people say on each? What do they feel comfortable discussing? Personal interest (news, gossip) versus personal medical need Real or imagined? Metadata Demographics, medical diagnosis, etc. Explicit vs. implicit creation Patient groups versus location data Accessibility for research

Summary Source Truthfulness Anonymity and usefulness Metadata Creation Accessibility for research Web search High High Rare Implicit Within companies or via toolbars General social media Medical social media Medical internet aggregators Smartphone interaction Actively collecting data Low Low-medium Available Explicit Through hoses or scraping Medium-High High Common Explicit Usually via scraping High Medium -- Explicit? High Medium None Implicit Very difficult Variable Medium Available Explicit Easy Make your own! (Yom-Tov et al., 2015)

Research works

Postmarket drug safety surveillance via search queries Why? Current postmarket drug surveillance mechanisms depend on patient reports Hard to identify if an adverse reaction happens after the drug is taken for a long period Hard to identify if several medications are taken at the same time Therefore, Could we complement this process by looking at search queries? (Yom-Tov and Gabrilovich, 2013)

Postmarket drug safety surveillance via search queries Data queries submitted to Yahoo search engine during 6 months in 2010 176 unique million users (search logs anonymized) Drugs under investigation: 20 top-selling drugs (in the US) Symptoms lexicon 195 symptoms from the International Statistical Classification of Diseases (ICD) and related health problems (WHO) filtered by Wikipedia (http://en.wikipedia.org/wiki/list_of_medical_symptoms ) expanded with synonyms acquired through an analysis of the most frequently returned web page when a symptom was forming the query Aim quantify the prevalence of adverse drug reports (ADR) for a given drug (Yom-Tov and Gabrilovich, 2013)

Postmarket drug safety surveillance via search queries groundtruth : reports to repositories for safety surveillance for approved drugs mapped to same list of symptoms score of drug-symptom pair n ij : how many times a symptom was searched Day 0: first day user searched for a drug D if the user has not searched for a drug, then day 0 is the midpoint of his history (Yom-Tov and Gabrilovich, 2013)

Postmarket drug safety surveillance via search queries Comparison of drug-symptom scores based on query logs and groundtruth Which symptoms reduce this correlation the most? (most discordant ADRs) discover previously unknown ADRs that patients do not tend to report (Yom-Tov and Gabrilovich, 2013)

Predicting depression via social media Mental illness leading cause of disability worldwide 300 million people suffer from depression (WHO, 2001) Services for identifying and treating mental illnesses: NOT adequate Can content from social media (Twitter) assist? Focus on Major Depressive Disorder (MDD) low mood low self-esteem loss of interest or pleasure in normally enjoyable activities (De Choudhury et al., 2013)

Predicting depression via social media Data set formation crowdsourcing a depression survey, share Twitter username determine a depression score via a formalized questionnaire (Center for Epidemiologic Studies Depression Scale; CES-D): from 0 (no symptoms) to 60 476 people diagnosed with depression with onset between September 2011 and June 2012 agreed to monitor their public Twitter profile 36% with CES-D > 22 (definite depression) Twitter feed collection ~ 2.1 million tweets depression-positive users (from onset and one year back) depression-negative users (from survey date and one year back) (De Choudhury et al., 2013)

Predicting depression via social media Examples of feature categories (overall 47) Engagement daily volume of tweets, proportion of @replyposts, retweets, links, question-centric posts, normalized difference between night and day posts (insomnia index) Social network properties (ego-centric) followers, followees, reciprocity (average number of replies of U to V divided by number of replies from V to U), graph density (edges / nodes in a user s ego-centric graph) Linguistic Inquiry and Word Count (LIWC - http://www.liwc.net) features for emotion: positive/negative affect, activation, dominance features for linguistic style: functional words, negation, adverbs, certainty Depression lexicon Mental health in Yahoo! Answers Pointwise-Mutual-Information + Likelihood-ratio between depress* and all other tokens (top 1%) TF-IDF of these terms in Wikipedia to remove very frequent terms:1,000 depression words Anti-depression language lexicon of antidepressant drug names (De Choudhury et al., 2013)

Predicting depression via social media Depressive patterns decrease in user engagement (volume and replies) higher Negative Affect (NA) low activation (loneliness, exhaustion, lack of energy, sleep deprivation) Depression class Non-depression class (De Choudhury et al., 2013)

Predicting depression via social media Depressive patterns increased presence of 1st person pronouns decreased for 3rd person pronouns use of depression terms higher (examples: anxiety, withdrawal, fun, play, helped, medication, side-effects, home, woman) Depression class Non-depression class (De Choudhury et al., 2013)

Other works using social media Twitter HIV detection Modeling influenza rates Modeling health topics Modeling disease spread Flickr Pro-anorexia and prorecovery Google Flu Trends Forecasting influenza Wikipedia Nowcasting and forecasting diseases

Does Sustained Participation in an Online Health Community Affect Sentiment? Large breast cancer community Impact of different factors on post sentiment Time since joining the community, posting activity, age, cancer stage (Zhang et al, 2014)

Does Sustained Participation in an Online Health Community Affect Sentiment? Dataset breastcancer.org 291,528 posts in 31,034 threads published by 12,819 community members between May 2004 and September 2010 Metadata including user profiles were also extracted Automated Sentiment Analysis Built a classifier 1,000 posts were manually annotated (positive or negative) (Zhang et al, 2014)

Does Sustained Participation in an Online Health Community Affect Sentiment? For each post, a sentiment score (probability of post being positive) was calculated. Significant increase in sentiment of posts through time Different patterns for initial posts and reply posts Factors play a role (Zhang et al, 2014)

A global compendium of human dengue virus occurrence Database comprising occurrence data linked to point or polygon locations. Goal Generate a global risk map and associate burden estimates. Data collection Search by dengue in PubMed, ISI Web of Science and ProMED Publications between 1960 and 2012 Data from HealthMap (Messina et al, 2014)

A global compendium of human dengue virus occurrence Geo-positioning of the data Location extracted from the articles Latitudinal and longitudinal coordinates determined using Google Maps (Messina et al, 2014)

A global compendium of human dengue virus occurrence (Messina et al, 2014)

Tracking Flu-Related Searches on the Web for Syndromic Surveillance Campaign using a keyword-triggered sponsored link in Google Adsense, for Canadian searchers Keywords: flu or flu symptoms Number of impressions roughly proportional to the number of searches containing the keywords Daily statistics on impressions and clicks aggregated to match the time periods of the FluWatch reports. (Eysenbach, 2006)

Tracking Flu-Related Searches on the Web for Syndromic Surveillance (Eysenbach, 2006) (Eysenbach, 2006)

Measuring the impact of epidemic alerts on human mobility using cell-phone network data Measure the impact that the alerts issued by the Mexican government had during the H1N1 flu outbreak in 2009 Mobility characterized using anonymized Call Detail Records (CDRs) traces (Frias-Martinez et al., 2012)

Measuring the impact of epidemic alerts on human mobility using cell-phone network data (Frias-Martinez et al., 2012)

How the Napa earthquake affected Bay Area sleepers https://jawbone.com/blog/napa-earthquake-effect-on-sleep/

Topics for SSIM

Topics for SSIM The use of Wikipedia for automatic translation in the health domain Using a set of Portuguese health queries, the goal of this work is to evaluate if and how well can Wikipedia be used to automatically translate Portuguese medical expressions to the English language. It is also a goal of this work to compare the Wikipedia approaches to other well-established approaches.

Topics for SSIM Assessing and comparing the readability of online topics Using a set of search queries previously classified into topics, the goal of this work is to analyze and compare the readability of the initial documents retrieved with those queries. Evaluation of query expansion approaches using the CLEF ehealth 2016 test collection The goal of this work is to evaluate the query expansion approaches that were proposed in a previous work using a newly-formed test collection. The evaluation should focus on the relevance, understandability and credibility of the obtained results.

Topics for SSIM The use of Data Mining to understand behaviour dynamics in online health forums: state of the art Do a survey and write a scientific article on the use of Data Mining to understand behaviour dynamics in online health forums. Automatic text simplification in the health domain: state of the art Do a survey and write a scientific article on current techniques for automatic text simplification in the health domain.

References Dan Pelleg, Elad Yom-Tov, Yoelle Maarek (2012). Can you believe an anonymous contributor? On truthfulness in Yahoo! Answers Elad Yom-Tov, Evgeniy Gabrilovich (2013). Postmarket Drug Surveillance Without Trial Costs: Discovery of Adverse Drug Reactions Through Large-Scale Analysis of Web Search Queries Elad Yom-Tov; Ingemar Cox; Vasileios Lampos (2015). Learning about health and medicine from Internet data. Gunther Eysenbach (2006). Tracking flu-related searches on the Web for syndromic surveillance Jane P Messina, Oliver J Brady, David M Pigott, John S Brownstein, Anne G Hoen & Simon I Hay (2014). A global compendium of human dengue virus occurrence Munmun De Choudhury, Michael Gamon, Scott Counts and Eric Horvitz (2013). Predicting depression via social media Shaodian Zhang, Erin Bantum, Jason Owen, Noémie Elhadad (2014). Does Sustained Participation in an Online Health Community Affect Sentiment? Susannah Fox (2011). Health Topics. Pew Internet Project. Susannah Fox and Maeve Duggan (2013). Health Online 2013. Pew Internet Project. Vanessa Frias-Martinez, Alberto Rubio, Enrique Frias-Martinez (2012). Measuring the impact of epidemic alerts on human mobility using cell-phone network data