AN EPIDEMIOLOGIC MODELING AND DATA INTEGRATION FRAMEWORK

Similar documents
Abstract of the PhD Thesis Intelligent monitoring and early warning system of pandemic viruses

Modern Epidemiology A New Computational Science

Stochastic Modelling of the Spatial Spread of Influenza in Germany

4.3.9 Pandemic Disease

Strategies for containing an emerging influenza pandemic in South East Asia 1

Public Health Resources: Core Capacities to Address the Threat of Communicable Diseases

Management of Pandemic Influenza Outbreaks. Bryan K Breland Director, Emergency Management University of Alabama at Birmingham

Cellular Automata Model for Epidemics

Agent-Based Models. Maksudul Alam, Wei Wang

arxiv: v1 [cs.si] 29 Jan 2018

Simulation of HIV/AIDS distribution using GIS based cellular automata model.

A GLOBAL STOCHASTIC MODELING FRAMEWORK TO SIMULATE AND VISUALIZE EPIDEMICS. Saratchandra Indrakanti. Thesis Prepared for the Degree of

Background Describing epidemics Modelling epidemics Predicting epidemics Manipulating epidemics

Case Studies in Ecology and Evolution. 10 The population biology of infectious disease

Towards Real Time Epidemic Vigilance through Online Social Networks

Exercises on SIR Epidemic Modelling

Mathematical Modelling of Effectiveness of H1N1

A Flexible Automata Model for Disease Simulation

Acute respiratory illness This is a disease that typically affects the airways in the nose and throat (the upper respiratory tract).

MODELLING INFECTIOUS DISEASES. Lorenzo Argante GSK Vaccines, Siena

TJHSST Computer Systems Lab Senior Research Project

Caribbean Actuarial Association

Avian Influenza (Bird Flu) Fact Sheet

Agricultural Outlook Forum Presented: February 16, 2006 THE CURRENT STATE OF SCIENCE ON AVIAN INFLUENZA

Pandemic Risk Modeling

Sensitivity analysis for parameters important. for smallpox transmission

Mathematics for Infectious Diseases; Deterministic Models: A Key

Epidemiology Treatment and control Sniffles and Sneezes Mortality Spanish flu Asian flu Hong Kong flu The Swine flu scare

Influenza: The Threat of a Pandemic

What do epidemiologists expect with containment, mitigation, business-as-usual strategies for swine-origin human influenza A?

Lesson 20 Study Guide: Medical Biotechnology Pandemic Flu & Emergent Disease

Identifying, Preparing for & Reducing Pandemic Risk

Avian influenza Avian influenza ("bird flu") and the significance of its transmission to humans

Biological Event Modeling for Response Planning

An Agent-Based Model for Containing a Virus Outbreak

A modelling programme on bio-incidents. Submitted by the United Kingdom

Pandemic Influenza: Hype or Reality?

Public Health Management and Preparedness in Wildlife Zoonotic Disease Outbreak

PANDEMIC INFLUENZA PLAN

Burton's Microbiology for the Health Sciences

Prepare to Care Pandemic Planning at Fraser Health

Influenza. Gwen Clutario, Terry Chhour, Karen Lee

Network Science: Principles and Applications

SEPTEMBER VIENNA, AUSTRIA

TJHSST Computer Systems Lab Senior Research Project

OIE Situation Report for Highly Pathogenic Avian Influenza

Image of Ebola viruses exiting host cells HUMAN VIRUSES & THE LIMITATION OF ANTIVIRAL DRUG AGENTS

Modeling of epidemic spreading with white Gaussian noise

Title: Understanding the role of linker histone in DNA packaging with mathematical modelling

Simulation of the Spread of Epidemic Disease Using Persistent Surveillance Data

Dynamics and Control of Infectious Diseases

Influenza. Paul K. S. Chan Department of Microbiology The Chinese University of Hong Kong

MODELLING THE SPREAD OF PNEUMONIA IN THE PHILIPPINES USING SUSCEPTIBLE-INFECTED-RECOVERED (SIR) MODEL WITH DEMOGRAPHIC CHANGES

Ralph KY Lee Honorary Secretary HKIOEH

TABLE OF CONTENTS. Peterborough County-City Health Unit Pandemic Influenza Plan Section 1: Introduction

Modeling the Effects of HIV on the Evolution of Diseases

Best Practice Guideline for the Workplace During Pandemic Influenza OHS & ES

AVIAN FLU BACKGROUND ABOUT THE CAUSE. 2. Is this a form of SARS? No. SARS is caused by a Coronavirus, not an influenza virus.

Parasites transmitted by vectors

International Journal of Pharma and Bio Sciences

Epidemic Modelling Using Cellular Automata

Mathematical Formulation and Numerical Simulation of Bird Flu Infection Process within a Poultry Farm

Tom Hilbert. Influenza Detection from Twitter

The mathematics of diseases

Image of Ebola viruses exiting host cells HUMAN VIRUSES & THE LIMITATION OF ANTIVIRAL DRUG AGENTS

Core 3: Epidemiology and Risk Analysis

AOHS Global Health. Unit 1, Lesson 3. Communicable Disease

OIE Situation Report for Avian Influenza

= Λ μs βs I N, (1) (μ + d + r)i, (2)

Spread of Human Diseases. [Name of writer] [Name of institution]

Simulation-based optimization of mitigation strategies for pandemic influenza.

Contents. Mathematical Epidemiology 1 F. Brauer, P. van den Driessche and J. Wu, editors. Part I Introduction and General Framework

OIE Situation Report for Avian Influenza

Supplemental Resources

Introduction to Reproduction number estimation and disease modeling

FOOT-AND-MOUTH DISEASE (FMD) VIRUSES THE AIRBORNE TRANSMISSION OF. Elisabeth Schachner, Claudia Strele and Franz Rubel

OIE Situation Report for Highly Pathogenic Avian Influenza

Putting it together: The potential role of modeling to explore the impact of FMD

Avian Influenza and Other Communicable Diseases: Implications for Port Biosecurity

Epidemic Modelling: Validation of Agent-based Simulation by Using Simple Mathematical Models

1) Complete the Table: # with Flu

SIS-SEIQR Adaptive Network Model for Pandemic Influenza

DRAFT WGE WGE WGE WGE WGE WGE WGE WGE WGE WGE WGE WGE WGE WGE GETREADYNOWGE GETREADYNOWGE GETREADYNOWGE GETREADYNOWGE.

OIE Situation Report for Highly Pathogenic Avian Influenza

Biostatistics and Computational Sciences. Introduction to mathematical epidemiology. 1. Biomedical context Thomas Smith September 2011

Bioterrorism and the Pandemic Potential

H1N1 Flu Virus Sudbury & District Health Unit Response. Shelley Westhaver May 2009

Modelling to Contain Pandemic Influenza A (H1N1) with Stochastic Membrane Systems: A Work-in-Progress Paper

Modelling the Dynamic of the Foot-and Mouth Disease in England 2001

Influenza. Giovanni Maciocia

Global Challenges of Pandemic and Avian Influenza. 19 December 2006 Keiji Fukuda Global influenza Programme

Epidemiological Situation on Pandemic Influenza H1N in the World and in Japan

LECTURE OUTLINE. B. AGENT: Varicella-zoster virus. Human herpes virus 3. DNA virus.

Definitions. DefinitionsNumbers and Rates. 3 Important Kinds of Rates. 3 Important Types of Rates. Introduction to Epidemiology in the Community

The roadmap. Why do we need mathematical models in infectious diseases. Impact of vaccination: direct and indirect effects

Contact Tracing in Health-care Information System - with SARS as a Case Study

Conflict of Interest and Disclosures. Research funding from GSK, Biofire

ESTIMATION OF THE REPRODUCTION NUMBER OF THE NOVEL INFLUENZA A, H1N1 IN MALAYSIA

A. No. There are no current reports of avian influenza (bird flu) in birds in the U.S.

Transcription:

AN EPIDEMIOLOGIC MODELING AND DATA INTEGRATION FRAMEWORK Pfeifer B 1, Seger M 1, Netzer M 1, Osl M 1, Modre-Osprian R 2, Schreier G 2, Hanser F 3, Baumgartner C 1 Abstract In this work a cellular automaton software package for simulating different infectious diseases, storing the simulation results in a data warehouse system, and analyzing the obtained results in order to generate predictions models or contingency plans is proposed. The Brisbane H3N2 flu virus, which has been spreading this winter season, was used for spreading simulation in federal state of Tyrol. The simulation-modeling framework consists of an underlying cellular automaton. The generated data are stored in the back room of the data warehouse using the Talend Open Studio software package, and subsequent statistical and data mining tasks are performed using the Knowledge Discovery in Database Designer (KD 3 ). The obtained simulation results were used for generating prediction models for all nine federal States of Austria, and were compared to flu outbreaks in the past. 1. Introduction In the field of model building and simulation in medical informatics, methods and tools are developed for acquiring data, which are then processed and analyzed using bio statistical and data mining methods. By applying these methods new knowledge is generated that helps contributes towards a better understanding of the underlying processes, which enables to generate and improve existing strategies, treatments, and workflows. Communicable or infectious diseases kill more people worldwide than any other single cause. Pathogenic agents such as bacteria, viruses, parasites or fungi are responsible for those diseases. The disease transmission from individual to individual occurs by physical contact with an infected individual, usage and handling of contaminated substances like food and liquids, airborne inhalation, body fluids, vectored infection to name just a few. When studying historical documents some endemic and/or pandemic outbreaks are described. In the years from 1347 to 1352 over 25 million Europeans (about one third of Europe s population at this time) were killed by the plague, which is better known as black death. In 1896 the next outbreak of the plague occurred, and spread to nearly every part of the globe, and killed about 12 mil- 1 UMIT, Institute of Biomedical Engineering, Eduard Wallnöfer Centre 1, 6060 Hall/Tirol 2 ARC Seibersdorf research GmbH, Innsbruck, Austria 3 UMIT, Research Division for Pervasive Health, Eduard Wallnöfer Centre 1, 6060 Hall/Tirol 33

lion individuals up to 1945. The Spanish Flu outbreak occurred in the years between 1918 and 1920 and was able to kill between 20 and 50 million people, especially young adults and teens with well working immune systems. In 1957 about one million individuals fell prey to the Asian Flu, and the Hong Kong Flu killed about 700.000 individuals. The Acquired Immune Deficiency Syndrome (AIDS), which is caused by the immunodeficiency virus (HIV), was first recognized in the 1980s. Its death toll lies by about 20 million individuals. The HI-Virus infects about 50 million individuals all over the world. More historical data can be obtained at [1]. This shows that infectious diseases and the generation of contingency plans and strategies are of importance to check an outbreak, which means to save lives and to minimize economical impact. As every year the flu (especially during winter season) afflicts Austria, as well as other countries. During the Christmas time the Influenza type A subtype H3N2 virus, named Brisbane, has exponentially been spreading [2]. Typically, about 10 percent of adults and 15 percent of children fall ill. The symptoms are: sudden high fever, shooting pains and arthralgia. For modeling different communicable diseases several approaches were presented in the past [3-5]. Our proposed framework enables to simulate the spatio-temporal individual based behavior of different communicable diseases and to persistently store the simulation results for further analysis using data mining techniques. 2. Methods 2. 1. Infectious Disease Simulation The classic epidemic model, which is known as S-I-R model, is widely used for simulating the behavior of communicable diseases [3-5]. In the S-I-R model the class S denoted the number of susceptibles, class I denotes the number of infectives, and class R refers to the number of recovered people in the model. The model is given by the initial value problem: ds dt = ais, S 0 0 di dt = ais bi, I 0 0 dr dt = bi, R 0 0 where the sum of the model classes S(t)+I(t)+R(t)=N, with N being the number of all individuals. However, the classic SIR model is unable to take account of natural birth and death, immigration and emigration, passive immunity and spatial arrangement. Therefore, partial differential equations (PDE) can be used. Following PDE can be used for modeling infection diffusion through space: 34

ds dt = (ai + I β( 2 x + 2 I 2 y ))S, S 0 2 0 di dt = (ai + I β( 2 x + 2 I 2 y )) bi, I 0 2 0 dr dt = bi, R 0 0 With these models it is possible to simulate the spreading of a disease over a population in space and time, but if tracking of individuals or geographical conditions and demographic realities are needed these models are also inadequate, and therefore, cellular automata (CA) modeling approach can be used. For this purpose cellular automata (CA) can be used. A CA is a dynamical system in which time and space is discrete; it is specified by a regular discrete lattice of cells and boundary conditions, a finite set of cells and states, a defined neighborhood, and a state transition function for cell evolution over time. A CA can be formal described as a 9-tuple CA = (C,N,,Ψ,Q,δ,σ,q 0,F), where C is the cell adjustment, N is the neighborhood definition, denotes the input alphabet, Ψ the output alphabet, and Q describes the set of internal cell states. δ denotes the state transition function with δ : Q Q n i Q, and σ is the output function with σ : Q Ψ. q 0 denotes the initial state with q 0 Q, and F denotes the finites states with F Q, respectively. The disease modeling and simulation framework was implemented using Java SE 1.6. As object oriented paradigms and software patterns were used, the software package is highly generic, which means, that approaches based on cellular automata can be easily integrated and implemented using this framework. Several classes exist in the kernel of the simulation and modeling package. Only the two generic CA package classes are described here briefly. The class CellularAutomaton defines a generic cellular automaton and provides the abstract method compute(), which is responsible for performing one simulation step over the cells. The class Cell represents one generic cell with the associated properties and neighborhood relation. The abstract method performcellaction() has to be overridden for performing the cell state transition function. Furthermore, the resulting states and attributes are persistently stored for each cell and each individual for further analysis; for each time step also a snapshot of the geographical map is generated and stored as portable network graphics (PNG) file. 2. 2. Data Integration A data warehouse is defined as subject oriented, integrated, non-volatile collection of integrated data [6]. The generated life sciences data need to be stored in a data warehouse in order to perform further analysis tasks. The underlying schema used in the backroom component is the so-called star schema. This schema is characterized by one or more large fact tables that contain a number of the primary information in the data warehouse, and a number of much smaller dimension tables, each of which contains information about the entries for a particular attribute in the fact table. A query on such a schema is characterized as a join between fact tables and several dimensional tables. 35

For the data integration tasks the freely available open source software package provided by Talend Open Data Solution is used [7]. The Talend Open Studio enables the backroom specialist to develop data extraction, transformation (cleansing) and loading (ETL) components. The software then generates Java code that can be used in own software solutions. Furthermore, it is possible to implement own components, if the offered components do not fulfill the requirements. Figure 1 shows the graphical user interface of the Talend Open Studio design tool. Figure 1: Talend Open Studio Designer, which was used for integrating the generated simulation data. For more information see text. 2. 3. Knowledge Discovery in Simulation Data When performing knowledge discovery several steps are usually involved. These steps incorporate focusing and describing the problem, preprocessing and transformation of the data, performing statistical analyses and data mining algorithms, and a concluding evaluation [8]. As all the simulation result data is stored in the data warehouse system the preprocessing and transformation tasks can be reduced. So, one can focus on building analyzing workflows in order to generate new knowledge. Therefore, using the KD 3 (Knowledge Discovery in Database Designer) [9] software package the user can focus on building analyzing workflows to interpret the simulation results. Such a workflow is characterized by a repeatable pattern activity, which is used for solving defined processes using different parameters and data sets in simulation scenarios. So-called functional objects (FOs) are the key elements of the application. A FO consists of in- and out ports, where the data set can be transported from one component to another. By implementing new FOs the KD 3 application can be easily expanded by new functionality. The new objects are then loaded into the KD 3 workspace by using the Java Reflection API. Figure 2 depicts the KD3 designer application that is implemented for data mining and analyzing the Brisbane flu simulation. 36

3. Results Figure 2: Screenshot of the knowledge-mining task of the Brisbane flu simulation results. In this study the Brisbane flu was simulated for the federal state of Tyrol. The seed point of the infection was set to the capital Innsbruck. Furthermore, using two different drugs medical treatment of the infected individuals was performed. Medication 1one is able to reduce lethality and severe symptoms by 55 percent, and medication two by 45 percent. Furthermore, the social behavior of the individuals changes during the disease spread, which means that the individuals try to avoid unnecessary social contact. This would also occur in real situations. The run time of the simulation was 38 minutes for 365 simulated days. For running the simulation an Apple X-Serve 1.1 OS X Server Version 10.4.10 with 2x2GHz Dual Core Intel Xeon processor and 2GB of RAM was used. The simulation data was integrated using the Talend Software and analyzed with the KD 3 workflow mining tool. Although, only the federal state of Tyrol was simulated it is possible to give a prediction for the other nine federal states of Austria. The simulation results show that approximately 240 fatal outcomes could occur in Tyrol, which is 8% of the total fatal outcome that could occur in Austria. Figure 3a depicts the percentages of the fatal outcomes in the nine Austrian states. Furthermore, the number of possible infections for the federal state of Tyrol was estimated to be 44.787 individuals. The infection prediction for the other states is depicted in Figure 3b. 37

Figure 3: (a) Percentage of the estimated outcomes in all nine federal states of Austria; (b) Estimated infections for the nine states of Austria; for more information see text. Figure 4 depicts the different states of the possible individual classes. During the first 140 days most individuals get infected, after that initial infection period individuals become susceptible again, which means that the disease might infect them again. Some of the individuals get permanently immune against the disease. Figure 4: The sphered line (S) is used for the number of susceptibles in the population, (I) shows infected individuals, (Immune) shows immune individuals and the (R) line is used for depicting removed individuals. Figure 5 depicts the spatial arrangement of the disease spreading. The black markers used in Figure 5 demonstrate, where infected individuals are located and how the infection spreads over the population in space and two distinct time instances. Figure 5: Population density map of the federal state of Tyrol with superimposed infection map generated by the simulation run. The left figure shows the situation 20 days after the infection has started. The right figure indicates the spreading situation after 120 days. It can be can seen that after 120 days nearly every part of federal state Tyrol is involved, but the infection density is much lower compared to the beginning of the infection. 38

4. Discussion and Outlook The framework enables the simulation of different infectious diseases. All the generated data is subsequent stored in a data warehouse system, which enables to perform analytical tasks for generating prediction models or improvement of contingency plans. Furthermore, our simulations are in close agreement to past observations and records [10,11]. Public health offices could use the tool for getting deeper understanding in the spreading mechanisms and for preventing serious long-term economic repercussions. 5. Acknowledgement We thank the Federal Ministry of Economics and Labour of the Republic of Austria, the Tyrolean Future Foundation and the Competence centre HITT-health Information Technologies Tyrol for funding this work. 6. References [1] WORLD HEALTH ORGANIZATION (WHO). Available at: http://www.who.int [2] Austria to face flu epidemic in new year (2009); http://www.wienerzeitung.at/desktopdefault.aspx?tabid=4975&alias=wzo&cob=388656 [3] K. LICHTENBERGER. Stochastic cellular automaton models in disease spreading and ecology. Oct 2005. Doctoral Thesis, TU-Graz [4] HW. HETHECOTE. The mathematics of infectious diseases. Society for Industrie and Applied Mathematics. Vol 42, pp.599-653, Oct 2000. [5] S. EUBANK, H. GUCLU, V. A. KUMAR, M. MARATHE et al., Modelling disease outbreaks in realistic urban social networks, Nature, Jan 2004. [6] R. KIMBALL AND J. CASERTA, The Data Warehouse ETL Toolkit, Wiley Publishing Inc., 2004. [7] TALEND OPEN STUDIO. www.talend.com, 2008. [8] U. M. FAYYAD, G. PIATETSKY-SHAPIRO, AND P. SMYTH, From data mining to knowledge discovery: An overview, in Advances in Knowledge Discovery and Data Mining, pp. 1 34, 1996. [9] PFEIFER B, TEJADA MM, KUGLER K, OSL M, NETZER M, SEGER M, MODRE-OSPRIAN R, SCHREIER G, TILG B. A Biomedical Knowledge Discovery in Databases Design Tool - Turning Data into Information. ehealth 2008, Wien, 29.-30. Mai 2008. [10] DIAGNOSTISCHES INFLUENZA NETZWERK ÖSTERREICH: http://www.influenza.at/influenza-main.htm [11] KÄRNTNER GEBIETSKRANKENKASSE. Influenza eine nicht ungefährliche Krankheit: http://www.kgkk.at/portal/index.html;jsessionid=c7a0928475e0e293ff9cc58b5bf544c7?ctrl:cmd=render&ctrl:wi ndow=kgkkportal.channel_content.cmswindow&p_menuid=7602&p_tabid=2&p_pubid=11773 39