CHALLENGES IN INFLUENZA FORECASTING AND OPPORTUNITIES FOR SOCIAL MEDIA MICHAEL J. PAUL, MARK DREDZE, DAVID BRONIATOWSKI

Similar documents
USING SOCIAL MEDIA TO STUDY PUBLIC HEALTH MICHAEL PAUL JOHNS HOPKINS UNIVERSITY MAY 29, 2014

Towards Real-Time Measurement of Public Epidemic Awareness

Influenza forecast optimization when using different surveillance data types and geographic scale

Ongoing Influenza Activity Inference with Real-time Digital Surveillance Data

arxiv: v1 [q-bio.pe] 9 Apr School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ

Worldwide Influenza Surveillance through Twitter

The Texas Pandemic Flu Toolkit: Models for pandemic surveillance, preparation and response

Cloud- based Electronic Health Records for Real- time, Region- specific

Forecasting Influenza Levels using Real-Time Social Media Streams

Data Driven Methods for Disease Forecasting

Searching for flu. Detecting influenza epidemics using search engine query data. Monday, February 18, 13

Williamson County Influenza Surveillance Williamson County and Cities Health District Final Report August 25, 2015

Williamson County Influenza Surveillance

ESRI Health Conference

A systematic review of studies on forecasting the dynamics of influenza outbreaks

Flu Watch. MMWR Week 3: January 14 to January 20, and Deaths. Virologic Surveillance. Influenza-Like Illness Surveillance

Flu Watch. MMWR Week 4: January 21 to January 27, and Deaths. Virologic Surveillance. Influenza-Like Illness Surveillance

Ohio Department of Health Seasonal Influenza Activity Summary MMWR Week 14 April 3 9, 2011

Tom Hilbert. Influenza Detection from Twitter

Ohio Department of Health Seasonal Influenza Activity Summary MMWR Week 14 April 1-7, 2012

Asthma Surveillance Using Social Media Data

Williamson County Influenza Surveillance

Williamson County Influenza Surveillance

Advances in epidemic forecasting & implications for vaccination

Highlights. NEW YORK CITY DEPARTMENT OF HEALTH AND MENTAL HYGIENE Influenza Surveillance Report Week ending January 28, 2017 (Week 4)

Regional Level Influenza Study with Geo-Tagged Twitter Data

Ohio Department of Health Seasonal Influenza Activity Summary MMWR Week 3 January 13-19, 2013

Georgia Weekly Influenza Report

Ohio Department of Health Seasonal Influenza Activity Summary MMWR Week 7 February 10-16, 2013

Tarrant County Influenza Surveillance Weekly Report CDC Week 39: Sept 23-29, 2018

Georgia Weekly Influenza Report

Tarrant County Influenza Surveillance Weekly Report CDC Week 43: Oct 22-28, 2017

Tarrant County Influenza Surveillance Weekly Report CDC Week 17: April 22-28, 2018

Utilizing syndromic surveillance data from ambulatory care settings in NYC

Tarrant County Influenza Surveillance Weekly Report CDC Week 51: Dec 17-23, 2017

Tarrant County Influenza Surveillance Weekly Report CDC Week 4: Jan 21-27, 2018

Georgia Weekly Influenza Report

Tarrant County Influenza Surveillance Weekly Report CDC Week 11: Mar 10-16, 2019

Tarrant County Influenza Surveillance Weekly Report CDC Week 20: May 11-17, 2014

Influenza Surveillance in the United St ates

Influenza Report Week 45

Influenza Report Week 44

Tarrant County Influenza Surveillance Weekly Report CDC Week 35, August 27-September 2, 2017

Abstract. But what happens when we apply GIS techniques to health data?

Influenza Surveillance Summary: Denton County Hospitals and Providers 17.51% 15.95% 10.94% 7.41% 9.01% 8.08%

International Co-circulation of 2009 H1N1 and Seasonal Influenza (As of September 4, 2009; posted September 11, 2009, 6:00 PM ET)

Tarrant County Influenza Surveillance Weekly Report CDC Week 40: Oct 2-8, 2016

CSTE Right Size Surveillance Project Model Practices and Strategies. ILINET Provider Customized Report Card

Certificate of Appreciation: Centers for Disease Control By James L. Holly, MD Your Life Your Health The Examiner October 16, 2014

Influenza Report Week 47

North Dakota Weekly Influenza Update Influenza Season

October 19 - October 25, 2014 (MMWR Week 43)

A Smartphone-Driven Thermometer Application for Real-time Population- and Individual-Level Influenza Surveillance

Visual and Decision Informatics (CVDI)

Tarrant County Influenza Surveillance Weekly Report CDC Week 10: March 2-8, 2014

Tarrant County Influenza Surveillance Weekly Report CDC Week 06: February 2-8, 2014

ILI Syndromic Surveillance

Influenza Surveillance Report Saint Louis County Department of Public Health Week Ending 10/14/2018 Week 41

Influenza Surveillance Report Saint Louis County Department of Public Health Week Ending 11/11/2018 Week 45

Large Scale Analysis of Health Communications on the Social Web. Michael J. Paul Johns Hopkins University

FLU TREND PREDICTION IN THE WORLD OF BIG DATA. A new way to prevent influenza outbreaks

Towards Real Time Epidemic Vigilance through Online Social Networks

Outbreak Response/Epidemiology Influenza Weekly Report Arkansas

Tarrant County Influenza Surveillance Weekly Report CDC Week 5, Jan 29 Feb 4, 2017

Influenza Surveillance Report Saint Louis County Department of Public Health Week Ending 01/20/2019 Week 3

Influenza Surveillance Report Saint Louis County Department of Public Health Week Ending 12/30/2018 Week 52

Influenza Surveillance Report Saint Louis County Department of Public Health Week Ending 10/21/2018 Week 42

Director s Update Brief novel 2009-H1N1. Tuesday 21 JUL EDT Day 95. Week of: Explaining the burden of disease and aligning resources

WEEKLY FLU REPORT. This report includes... Important Notes: In This Issue

November 9 - November 15, 2014 (MMWR Week 46)

Outbreak Response/Epidemiology Influenza Weekly Report Arkansas

FUNNEL: Automatic Mining of Spatially Coevolving Epidemics

September 28-October 4, 2014 (MMWR Week 40) Highlights

December 7 - December 13, 2014 (MMWR Week 50) Highlights

Effects of Sampling on Twitter Trend Detection

Oregon s Weekly Surveillance Report for Influenza and other Respiratory Viruses. Published December 11th, 2009

Robust Outbreak Surveillance

Table 1: Summary of Texas Influenza (Flu) and Influenza-like Illness (ILI) Activity for the Current Week Texas Surveillance Component

Influenza Surveillance Report

Oregon s Weekly Surveillance Report for Influenza and other Respiratory Viruses

U.S. Army Influenza Activity Report Week Ending 13 October 2012 (Week 41)

Influenza Virologic Surveillance Right Size Project. Launch July 2013

WEEKLY INFLUENZA REPORT

Influenza Season and EV-D68 Update. Johnathan Ledbetter, MPH

Weekly Influenza Surveillance Summary

Data at a Glance: February 8 14, 2015 (Week 6)

April 19 - April 25, 2015 (MMWR Week 16)

Influenza Surveillance Report Saint Louis County Department of Public Health Week Ending 03/10/2019 Week 10

Seasonal Influenza Report

Tracking Disease Outbreaks using Geotargeted Social Media and Big Data

10/29/2014. Influenza Surveillance and Reporting. Outline

December 1-December 7, 2013 (MMWR Week 49)

Weekly Influenza Surveillance Summary

Machine Learning for Population Health and Disease Surveillance

Influenza Surveillance Report Week 47

Transcription:

CHALLENGES IN INFLUENZA FORECASTING AND OPPORTUNITIES FOR SOCIAL MEDIA MICHAEL J. PAUL, MARK DREDZE, DAVID BRONIATOWSKI

NOVEL DATA STREAMS FOR INFLUENZA SURVEILLANCE New technology allows us to analyze new types of data to infer influenza prevalence Especially data from the Web Most (in)famously Google Flu Trends Many other promising sources: Social media (especially Twitter) Mobile apps Wikipedia

WHAT ABOUT FORECASTING? Detection is easy and utility is limited Reliable forecasting is important for making preparations and allocating resources Google Flu has been shown to improve forecasting Shaman and Karspeck (2012) Nsoesie, Marathe, Brownstein (2013) Dugas et al. (2013) Social media hasn t been evaluated yet

CDC PREDICT THE FLU CHALLENGE Contest to forecast the 2013-14 flu season by augmenting existing surveillance with Web data Three metrics: Start of season Peak of season Intensity of season (peak rate and duration)

CDC PREDICT THE FLU CHALLENGE

INFLUENZA DATA So what exactly are we trying to predict? ILINet CDC-run network of thousands of US providers Hospitals report % of outpatients seen for influenza-like illness Weekly reports of estimated ILI prevalence Most commonly used flu metric Data is lagged by a week Real time surveillance doesn t exist through traditional means This is why novel data streams can help

FORECASTING MODEL Forecasts and current-week nowcasts can be produced using standard time series models with the lagged ILINet data Basic autoregressive model: This works quite well Especially for nowcasting

FORECASTING MODEL Can we improve this with social media data? Twitter can give estimates for the current week These estimates can be included in the model

TWITTER FLU DETECTION We used our state-of-the-art Twitter system Lamb et al (2013) and Broniatowski et al (2013) Two streams downloading data since Nov 2011 1% sample and stream filtered for health keywords About 4 million per day Cascade of tweet classifiers: Relevant to health Relevant to flu Indicates flu infection (vs general awareness) Can produce daily or weekly prevalence estimates # of tweets classified as flu infection # of tweets from full sample

RESULTS Mean absolute error when nowcasting: Data Average error per season 2011-12 2012-13 2013-14 ILINet.19.30.35 Twitter.34.36.49 ILINet+Twitter.15.21.21

RESULTS Mean absolute error when forecasting:

RESULTS

INFLUENZA DATA REVISIONS An important caveat about historical data Weekly ILINet values are subject to future revisions We were careful to train the models on the data that would have been available at the time of the prediction But we evaluated on the gold standard value from the final report for the season

INFLUENZA DATA REVISIONS An important caveat about historical data The value initially reported has an average absolute difference from the final value of.18 The value reported after 3 weeks still has an average difference of.10

INFLUENZA DATA REVISIONS Data Average error per season 2011-12 2012-13 2013-14 ILINet (current).19.30.35 ILINet (final).11.24.26 Error is greatly underestimated when using the final gold values instead of values available at time of forecast

INFLUENZA DATA REVISIONS

COMPARISON TO GOOGLE We also compared to Google Flu Trends Data Average error per season 2011-12 2012-13 2013-14 ILINet.19.30.35 ILINet+Twitter.15.21.21 ILINet+Google.20.44.28 Twitter improved nowcasting and forecasting more than Google

CONCLUSION #1 Twitter improves influenza forecasting For a given level of accuracy, including Twitter can give you 2-4 weeks of additional forecasting ability Twitter outperforms Google At least in these three seasons Google recently updated their model so comparison is difficult

CONCLUSION #2 When using historical data, be careful to use data that actually would have been available at the time of model training Others have assumed these were the same Our results showed that this has a substantial effect on performance

CONCLUSION #3 Always compare to a simple time series baseline Our results showed that Twitter by itself is worse than using lagged ILINet data No one had compared to this (using Twitter) But we then showed that you can do even better by combining both!

THANK YOU