Large Scale Analysis of Health Communications on the Social Web. Michael J. Paul Johns Hopkins University

Similar documents
Experimenting with Drugs (and Topic Models) Michael Paul and Mark Dredze Johns Hopkins University

Experimenting with Drugs (and Topic Models) Michael Paul and Mark Dredze Johns Hopkins University

USING SOCIAL MEDIA TO STUDY PUBLIC HEALTH MICHAEL PAUL JOHNS HOPKINS UNIVERSITY MAY 29, 2014

Summarizing Drug Experiences with Multi- Dimensional Topic Models. Michael Paul and Mark Dredze Johns Hopkins University

Brendan O Connor,

Asthma Surveillance Using Social Media Data

Towards Real-Time Measurement of Public Epidemic Awareness

CHALLENGES IN INFLUENZA FORECASTING AND OPPORTUNITIES FOR SOCIAL MEDIA MICHAEL J. PAUL, MARK DREDZE, DAVID BRONIATOWSKI

Living High. I.D.? asked the bouncer standing outside of NocNoc, a popular nightclub in downtown

Mathematical Modeling of Trending Topics on Twitter

Substance Misuse. Drugs. May 2015

Drugs. Cocaine is most commonly found in powder form and is snorted. Cocaine is a drug derived from the coca plant.

Predicting outcomes in online chatbot-mediated therapy

Machine Learning for Population Health and Disease Surveillance

EpiDash A GI Case Study

Defining Drugs. Any substance which, when taken into the body, alters or modifies one or more body functions

Grade 7 Lesson Substance Use and Gambling Information

Tom Hilbert. Influenza Detection from Twitter

Special Topic: Drugs and the Mind

Identifying Signs of Depression on Twitter Eugene Tang, Class of 2016 Dobin Prize Submission

Can You Really Get Addicted to Cocaine After Trying It Once?

A PROBABILISTIC TOPIC MODELING APPROACH FOR EVENT DETECTION IN SOCIAL MEDIA. Courtland VanDam

Monitoring of Epidemic Using Spatial Technology

Factsheet - Mephedrone

Filippo Chiarello, Andrea Bonaccorsi, Gualtiero Fantoni, Giacomo Ossola, Andrea Cimino and Felice Dell Orletta

Drugs. Sociology 230 Chapter 3a Dr. Babcock

Cocaine. Crack & 4. c : coke : charlie : rocks : freebase WHAT & WHY?

A Guide to Help You Reduce and Stop Using Tobacco

National Self Care Week Newsletter Second Edition

CASAAMedia Samantha Henning, Missy Harris & Caroline McCarthy. Social media Engagement & Evaluation

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals.

WHY EDUCATE YOUNG PEOPLE ABOUT DRUGS? PREPARE FOR LIFE AFTER SCHOOL KEEP SAFE AND PROTECTED DEAL WITH PEER PRESSURE RECOGNISE THE RELATIONSHIP

WHAT & WHY? Ecstasy 3. mdma : E : pills. No. 3 in a series of guides to help people understand what drugs are and why people take them SECOND EDITION

Drugs. Survey Question: During your life, how many times have you ever used marijuana?

REFER TO SOME OF THE ANSWERS STUDENTS MAY HAVE GIVEN EARLIER.

Use of Twitter to Assess Sentiment toward Waterpipe Tobacco Smoking

Great American Smokeout November 15, 2018 Communications Toolkit

Aims. To gain a greater awareness and understanding of risks associated with different substances and routes of administration

The Great Influenza, Woodrow Wilson and World History

Taking molly and adderall

#GETLOUD 66TH ANNUAL CMHA MENTAL HEALTH WEEK PUBLIC TOOLKIT

How you can help support the Beat Flu campaign

Hacettepe University Department of Computer Science & Engineering

Sensing, Inference, and Intervention in Support of Mental Health

Improving Foodborne Complaint and Outbreak Detection Using Social Media, New York City

Domestic Abuse & Mental Health

An unsupervised machine learning model for discovering latent infectious diseases using social media data

Carmen: A Twitter Geolocation System with Applications to Public Health

Palatine Police Department Det. Josh Hester Det. Phil Hemmeler

Speech Processing / Speech Translation Case study: Transtac Details

arxiv: v1 [cs.cy] 21 May 2017

A step towards real-time analysis of major disaster events based on tweets

Why did the network make this prediction?

#GETLOUD CMHA S 65 TH ANNUAL MENTAL HEALTH WEEK PARTNER TOOLKIT

Think Smart! Striving for a drug free society

How you can help support the Beat Flu campaign

Appendix 2: The nature and addictiveness of commonly used illicit drugs

Mining Human-Place Interaction Patterns from Location-Based Social Networks to Enrich Place Categorization Systems

Other Models of Addictions Treatment

A Smart Texting System For Android Mobile Users

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

Mobile Health Surveillance: The Development of Software Tools for. Monitoring the Spread of Disease

The Impact of Pandemic Influenza on Public Health

Use of emerging technologies in provision of Cancer Information Services: an international snap shot

Towards Real Time Epidemic Vigilance through Online Social Networks

Expert System Profile

Are You Satisfied with Life?: Predicting Satisfaction with Life from Facebook

Member 1 Member 2 Member 3 Member 4 Full Name Krithee Sirisith Pichai Sodsai Thanasunn

Washington State s Overdose Epidemic

MEDICINE MISUSE CAN HAPPEN TO ANYONE

YOU CAN BE SMOKE FREE

Background. Background cont d. Ventanillas de Bienestar: Fundamentals of Outreach & Enrollment to Latinos 8/15/2013

Teens who consistently learn about the risks of drugs from their parents are up to 50% less likely to use drugs than those who don t.

Communications Toolkit

CANNABIS. MARIJUANA. WEED.SKUNK. NPS. SPICE. MCAT. BALLOONS. COCAINE. COKE. GEAR. CHARLIE. ALCOHOL. BEVVY. BOOZE. JUICE.

Bugbears or Legitimate Threats? (Social) Scientists Criticisms of Machine Learning. Sendhil Mullainathan Harvard University

Anti-smoking vaccine developed

Parkinson s information and support

CDC s National Tobacco Education Campaign Quantitative & Qualitative Measures of Tips Impact

Language Technologies and Business in the Future

Forecasting Influenza Levels using Real-Time Social Media Streams

Latest Press Release. Pch..com

Families. Healthy Choices. Healthy. In This Issue. Community Health Choice Newsletter V1 2010

Cocaine. How Is Cocaine Abused? How Does Cocaine Affect the Brain?

Summary Report EU Health Award 2017

2 Prevalence of use in the population

1. Work alone and answer these questions. Answer Yes, No or Not sure for each one. Then show your answers to a partner.

Topic: GHB / Ecstasy Target Group: Grades Prepared By: Neil Wagstrom Agency: Glenwood Springs P. D.

Quitting. Study Guide. Information for teachers. The accompanying factsheets: The main resource:

Say It with Colors: Language-Independent Gender Classification on Twitter

ERA: Architectures for Inference

THE TOUGH TALK: AGING PARENTS AND SUBSTANCE ABUSE

Causal Inference over Longitudinal Data to Support Expectation Exploration. Emre Kıcıman Microsoft Research

Discovering Health Topics in Social Media Using Topic Models

Drug and Alcohol use amongst school age children Kankakee Area Metropolitan Enforcement Group

Diabetes and Heart Disease Awareness Molina Healthy Living with Diabetes sm and Heart Healthy Living sm

Annual Report 2014/15

Anti-smoking vaccine developed

Machine Learning and Event Detection for the Public Good

Transcription:

Large Scale Analysis of Health Communications on the Social Web Michael J. Paul Johns Hopkins University

3rd-year PhD student Who am I? Background: computer science (not an expert in health, medicine, or sociology J ) machine learning, natural language processing Affiliations: Center for Language and Speech Processing Human Language Technology Center of Excellence Collaborators (re: health + social media): Mark Dredze (my co-advisor) Hieu Tran, Alex Lamb, more students on the way

This Talk Methodology: simple statistical tools Topic Models Health analytics through social media Disease surveillance on Twitter Analysis of specialized discussion forums Communication on the Web Simple, scalable models of discourse

Probabilistic Topic Models Unsupervised machine learning Doesn t require human input Idea: associate words and documents with different categories called topics But we don t know what the topics are! Learn automatically through pattern recognition called clustering Linguistically bare, but useful for shallow analysis Helpful way to explore massive text collections Scalable and robust Really really popular LDA (2003) has 4,600 citations

Probabilistic Topic Models Statistical model of text generation Two types of parameters: p(topic document) for each document p(word topic) for each topic Optimize parameters to fit model to data (i.e. a collection of documents) maximum likelihood estimation, Monte Carlo, etc.

Probabilistic Topic Models D. Blei, A. Ng, M. Jordan. Latent Dirichlet Allocation. JMLR 2003.

Probabilistic Topic Models Why does this work? Intuition: words which co-occur in non-random ways are likely to be generated from the same topic Distributional Hypothesis (Harris 1954) words in similar contexts have similar meaning part of distributional semantics

This Talk Method: simple statistical tools Topic Models Health analytics through social media Disease surveillance on Twitter Analysis of specialized discussion forums Communication on the Web Simple, scalable models of discourse

Twitter for Public Health Huge amount of data 200+ million tweets every day (but only small sample is free) Decent proxy for what s going on in the world including population health Tweets are usually publicly available through API vs Facebook (more private accounts) Usually qualifies for IRB waiver

1.6 million health tweets from 2009-2010 distilled from a stream of 2 billion total tweets machine learning to distinguish pertinent tweets ( stuck in bed sick ) vs false positives ( so sick of school! ) Today (since Fall 2011): downloading ~2 million tweets a day using 400 targeted health search queries estimated 10% are pertinent In progress: Public web interface to analyze data Twitter Data Collection Languages beyond English (next up: Spanish)

Twitter Data Collection This is a lot of text! We don t (or at least, we didn t) know what health issues people actually tweet about. How to make sense of this much data? Topic models

Ailment Topic Aspect Model Topic model with special structure designed to find diseases and ailments Breaks down words by aspect: general words, symptom words, or treatment words We used the model output to analyze disease over time and location M.J. Paul and M. Dredze. You are what you Tweet: Analyzing Twitter for Public Health. ICWSM 2011.

Ailment Topic Aspect Model

Twitter: Influenza Rates

Twitter: Allergy Prevalence

Beyond Twitter Twitter is good for general population surveys More specialized applications would benefit from focused data sources, e.g. medical forums and question answering services online support groups etc. We are currently investigating forums for drug users

Drug Abuse on the Web Data: Drugs-Forum.com Discussion forums about recreational drug use Our working data set: 100K messages Why? Offers candid survey of drug use User base much larger than could be reached through surveys End goal: automatically extract and summarize information about various drugs, especially novel and emerging drugs

Drug Abuse on the Web How to explore giant discussion forum? Probabilistic topic models! Our previous topic model included structure about symptoms and treatments of ailments Here, we use a factored topic model to include delivery method (e.g. smoking, oral) and other aspects (e.g. effects, usage, chemistry). M.J. Paul and M. Dredze. Experimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions. AAAI IRKD-Bio Workshop. 2012.

Drug Abuse on the Web Can model different delivery methods of drugs: cocaine,smoking crack smoke smoking pipe hit rock powder smoked rocks ash cigarette cocaine,snorting nose water nasal spray mouth nostril snorting sinuses snort coke blow

Drug Abuse on the Web Can model different aspects of drugs: ecstasy,effects ecstasy,usage ecstasy,culture experience time people experiences feeling drug feel magic first feelings euphoria mind pills mdma pill test ecstasy pure quality kit contain testing people know music rolling rave people great mp3 best dance friends club love fun

Drug Abuse on the Web Can drill down to triples of (drug,delivery,aspect): cocaine,health cocaine addiction drug dopamine people brain drugs substance addictive using body physical cocaine,snorting,health nose pain damage blood cocaine problem cause cola using doctor problems sinus

This Talk Method: simple statistical tools Topic Models Health analytics through social media Disease surveillance on Twitter Analysis of specialized discussion forums Communication on the Web Simple, scalable models of discourse

Topic Models for Dialog Mixed membership Markov models Like topic models but aware of thread structure Topics in document depend on previous document i.e. speech acts in message depend on speech acts of parent message in thread Your response to me depends on what I wrote M.J. Paul. Mixed Membership Markov Models for Unsupervised Conversation Modeling. EMNLP 2012.

Topic Models for Dialog question answer likely to begin threads i my have computer tried help you your http com. windows likely to follow questions not likely to follow itself

Topic Models for Dialog In the future, I plan to add additional features to the model to potentially consider participant attributes (e.g. gender) participant role (e.g. doctor vs patient) relationship between participants

Lastly How to fit conversation modeling with health applications?

Potential Avenues for Future Work Study of intra-clinic communications e.g. internal SMS system (if such data can be obtained) Spread of (mis)information models of rumor propagation models of influence, ideology Modeling doctor-patient interactions more generally, interactions between experts and non-experts on the web

Thank You

Other Twitter Studies Learning to distinguish concern and awareness of the flu vs infection with the flu home with the flu vs roommates all have the flu, worried i m next Application area: behavioral epidemiology use tweets as a data source to validate behavioral models A. Lamb, M. Paul, M. Dredze. Investigating Twitter as a Source for Studying Behavioral Responses to Epidemics. AAAI IRKD-Bio Workshop. 2012.

Other Twitter Studies Analyze mentions of patient safety events (e.g. medical mistakes) R. Passerella, A. Nakhasi, S. Bell, M. Paul, P. Pronovost, M. Dredze. Twitter as a Source for Learning about Patient Safety Events. AMIA 2012. A. Nakhasi, R. Passerella, S. Bell, M. Paul, M. Dredze, P. Pronovost. Malpractice and Malcontent: Analyzing Medical Complaints in Twitter. AAAI IRKD-Bio Workshop. 2012.