STEC Whole Genome Sequencing Project

Similar documents
Is Whole Genome Sequencing Really Replacing Traditional Microbiology?

WGS Works! Shared Mission Different Roles APPLICATIONS SEQUENCING (WGS) Non-regulatory. Regulatory CDC. FDA and USDA. Peter Gerner-Smidt, MD ScD

PulseNet on the High Wire

Peter Gerner-Smidt, M.D., D.M.S. Enteric Diseases Laboratory Branch, CDC

Outbreak of Salmonella Newport Infections Linked to Cucumbers United States, 2014

Transitioning Public Health Microbiology to Whole Genome Sequencing: Experiences and Plans for Bacterial Foodborne Pathogens

The PulseNet Cost Benefit Study and Beyond: What We Have Learned & Where We Are Headed for Molecular Enteric Surveillance

Foodborne Illness and Outbreak Surveillance in the USA. Alison Samuel, Naghmeh Parto, Emily Peterson

Meeting the Challenge of Changing Diagnostic Testing Practices and the Impact on Public Health Surveillance

APril PUlseNet

Implementing FSMA: CDC s Surveillance Provisions

Surveillance and outbreak investigation of Shiga toxin-producing Escherichia coli using whole genome sequencing- time for a change!

The Conference for Food Protection Indianapolis, Indiana April 14, 2012 CDC Updat e

New York State Health Department's experience with implementing Whole Genome Cluster Analysis for Salmonella outbreak investigations

3/18/ Update: STEC Diagnosis and Surveillance in Wisconsin. Objectives. Objectives. Shiga toxin-producing Escherchia coli (STEC)

2014 Update: STEC Diagnosis and Surveillance in Wisconsin

How Whole-Genome Sequencing Impacts Outbreak Investigations A Public Health Perspective

Update on infections with and clinical lab guidelines for Shiga toxin-producing E. coli (STEC) in the United States

Whole genome sequencing

Foodborne Outbreak of E. coli Infections and Hemolytic Uremic Syndrome in Germany, 2011

Outbreak Alert! Trends in Foodborne Illness Outbreaks in the United States ( )

When a Cluster Becomes an Outbreak The Multistate Perspective

Bacterial Enteric Infections Detected by Culture-Independent Diagnostic Tests FoodNet, United States,

Lessons Learned from an Outbreak: E. coli O157:H7 linked to Romaine Lettuce National Investigation and Communication Process

Advanced Molecular Detection: An Overview

Food Microbiology 101

Listeria whole-genome-sequencing EFSA Project. Anses Statens Serum Institut Public Health England University of Aberdeen

APHL Next Generation Sequencing (NGS) Survey

Adam Aragon Lisa Onischuk Paul Torres NM DOH, Scientific Laboratory Division

Outbreak Investigations and Whole Genome Sequencing

Evaluation of Tennessee Foodborne Illness and Outbreak Response Using the Council to Improve Foodborne Outbreak and Response (CIFOR) Metrics

What's for dinner? Current issues in foodborne illness

Welcome to the PulseNet/OutbreakNet East Coast Regional Meeting

Welcome to the PulseNet/OutbreakNet Central Regional Meeting

Turtlepocalypse 2012

Welcome to the PulseNet/OutbreakNet West Coast Regional Meeting

Updates on PulseNet International Activities

The UK s National Collection of Type Cultures: Answers to 21 st century public health questions

Salmonella Outbreaks Linked to Poultry United States

Vibrio parahaemolyticus in the United States,

Multistate Foodborne Outbreaks: Investigation and Communication Process

the Future Hold? May 30, 2018

New Mexico Emerging Infections Program Overview. Joan Baumbach NM Department of Health September 23, 2016

Nashville, Tennessee. Assignment Description

Multistate Outbreak of Listeriosis Associated with Imported Ricotta Salata and Evidence of Cross- Contamination of Cut and Repackaged Cheeses

E. coli O157:H7 - Multistate Outbreak Associated with Hazelnuts, 2010

Comparative genomics of E. coli and Shigella:

Robert Tauxe, MD, MPH

Update of the CDC/HICPAC Guideline: Infection Prevention in Healthcare Personnel

Main conclusions and options for response

Investigation of foodborne disease outbreaks

FOODBORNE OUTBREAK INVESTIGATIONS: How Epidemiology Contributes to Public Health Action

E. coli O157:H7 Outbreak Associated with Consumption of Unpasteurized Milk, Kentucky, 2014

Conference for Food Protection 2014 Issue Form. Accepted as

CDC s Advanced Molecular Detection (AMD) Program and Public Health

CIFOR performance measure Measurement methods Target Range Minnesota 2013 Performance 1. Foodborne illness complaint reporting system:

Exploring the evolution of MRSA with Whole Genome Sequencing

E. coli O157:H7 - American Chef s Selection Angus Beef Patties, 2007

Pertussis Epidemiology and Vaccine Impact in the United States

Surveillance and outbreak response are major components

Food Safety Performance Standards: an Epidemiologic Perspective

CDC s experience: A case study of communications during a foodborne outbreak response

Council for Outbreak Response: Healthcare-Associated Infections & Antibiotic-Resistant Pathogens (CORHA) HICPAC December 1, 2016

Human Escherichia coli O121 Infections Linked to Frozen Snack Foods U.S., 2013

Progress towards STEC clinical diagnostic guidelines

Escherichia coli diagnostics

PT18 Identification and typing of STEC and other pathogenic E. coli

Benchmark datasets for phylogenomic pipeline validation

127,839 hospitalizations. [Jan ].

Position Statement Template

Advances in Gastrointestinal Pathogen Detection

JCM Accepts, published online ahead of print on 20 March 2013 J. Clin. Microbiol. doi: /jcm

Vibrio outbreak and surveillance: Maryland's collaborative approach

Culture-Independent Diagnostic Tests IMPACT ON INFECTIOUS DISEASE SURVEILLANCE

Outbreak Investigations: The Minnesota Perspective A Dynamic Process

CDC Programmatic Activities in Breast and Ovarian Cancer Genomics

CYP21A2 Mutations Found in Congenital Adrenal Hyperplasia Patients in the California Population

Understanding the Public Health Significance of Salmonella. Betsy Booren, Ph.D. Director, Scientific Affairs

Investigating Legionella: PCR Screening and Whole-Genome Sequencing

Advanced Molecular Detection and Epidemiology

Surveillance Networks and the detection and Investigation of Foodborne Disease Outbreaks What You See is What you Get

CDC - June 23, Salmonella Panama Infections Linked to Cantaloupe. Salmonella

America - Expanding the Network T

Performance Indicators for Foodborne Disease Programs

Is Zero Risk Actually Possible When Communicating Risk to Consumers: A Public Health Perspective

USDA APHIS One Health Initiative

Vibrio surveillance in the CIDT Era

Core 3: Epidemiology and Risk Analysis

Outbreak Data for Foodborne Illness Source Attribution: From Good to Great

Manual for Reporting of Food-borne Outbreaks in the framework of Directive 2003/99/EC 1

Starting from the bench Prevention and control of foodborne and zoonotic diseases

Position Statement Template

Infectious Diseases - Foodborne, Tennessee Department of Health, Communicable and Environmental Disease Services Assignment Description

Public Health Risks of Consuming Raw Milk Products - Surveillance and Prevention Efforts in the United States

USDA s New Shiga Toxin- Producing Escherichia coli Policy. James Hodges Executive Vice President American Meat Institute

Infectious Diseases New York State Department of Health, Division of Epidemiology, Bureau of Communicable Disease Control

Impact of Culture Independent Diagnostic Tests on Enteric Disease Outbreak Detection and Response: Nebraska,

FDA/NSTA Web Seminar: Teach Science Concepts and Inquiry with Food

PS : Comprehensive HIV Prevention Programs for Health Departments

Foodborne Outbreaks in Alaska,

Transcription:

STEC Whole Genome Sequencing Project Eija Trees, PhD, DVM Chief, PulseNet Next Generation Subtyping Methods Unit 16 th Annual PulseNet Update Meeting August 29 th, 2012 National Center for Emerging and Zoonotic Infectious Diseases Division of Foodborne, Waterborne and Environmental Diseases

Objectives of the Presentation Describe the STEC genome sequencing project Describe the sequence diversity seen in the STEC population Way forward

Outline Background and Objectives of the Project Materials and Methods Results Conclusions Future Plans 100K Pathogen Genome Sequencing Project

Background CDC Bioinformatics Blue Ribbon Panel in June 2011 2011 end-of-year funding from CDC OD, OID and NC Implemented a proof-of-concept bioinformatics core group through an outside contract In 2012 focus on two pilot projects: o o MicrobeNet: development of shared bioinformatic services and web platforms Next Generation PulseNet: Whole genome sequencing, comparative genomics, molecular epi

Next Generation PulseNet Project Objectives Perform whole genome sequencing, assembly and comparative analysis of 250 STEC strains Reference database Understand diversity in the STEC population and determine correlation between WGS data, PFGE and epi Mine the resulting data for targets to predict strain type, virulence and antimicrobial resistance quickly and at high resolution

Next Generation PulseNet Project Objectives (cont d) Public health benefits: Standardized workflows/sops for next generation sequencing across multiple platforms Reusable tools, techniques and software pipelines for a wide range of comparative genomics and molecular epidemiologic applications Builds capacity for rapid genome-level molecular epidemiology

MATERIALS AND METHODS

Strain Selection Selection criteria should help us to understand variation: 1 within an outbreak or during a carrier state 2 among epidemiologically unrelated isolates within a serotype and between serotypes

Strain selection (cont d) Variation within an STEC outbreak or a carrier state 10 isolates each from 5 outbreaks O157:H7 (3), O111:NM (1), O145:NM (1) Different types of outbreaks: clonal vs polyclonal, short vs long lasting, point source vs vehicle never identified 11 isolates recovered from the same patient in the course of 2½ months

Strain selection (cont d) Variation among epidemiologically unrelated STEC strains 25 strains each from serogroups O26, O111, and O121 5 strains each from serogroups O45, O69, O91, O103, O118, O145 1 strain each from serogroups ranked in prevalence #11-22 1 representative each from past historical O157 (and non-o157) outbreaks (36 in total) Sporadic O157 isolates representing common PFGE patterns (23 in total) Other E coli pathotypes: ETEC (6), EPEC (3), EAggEC (3), EIEC (1)

Whole genome sequencing and assembly Illumina HiSeq 2x100 bp; average coverage 400x Pros: highly accurate reads with massive coverage Cons: reads short Assembled with CLC Bio; average of 200 contigs Subset of 99 isolates sequenced with Pacific Biosciences SMRT (on-going) Pros: longer reads (up to 10 kb) Cons: error rate high, multiple SMRT cells required per strain for adequate coverage Hybrid assembly with error correction using Illumina reads being optimized

K-mer -based Approach to Identify SNPs Suffix array to find all k-mers, where k=25 Find candidate variant positions where k-mer exists, and central bases varies Align k-mers with possible variations against target Ecoli reference genome Eliminate repetitive bases Phylogenetic analysis (trees) based on variations 12 bases 13 th 12 bases TTGATAGGGCAAAAGCGCCGATTTT When k-mer 25 is used, the 13 th base position is used for determining if there is a variation No prior sequence assembly or multiple sequence alignment required Fast completion of analysis Phylogenies less affected by regions that have undergone strong selection, deletions or horizontal gene transfer Dependent on high quality data Reference genome required

RESULTS

Probing Diversity within an Outbreak O157:H7 in MN Daycare (2008) PFGE-XbaI PFGE-BlnI 2011EL-2091 2011EL-2098 2011EL-2101 2011EL-2094 2011EL-2090 2011EL-2092 2011EL-2093 2011EL-2097 2011EL-2099 2011EL-2096 XbaI/BlnI PFGE VARIANT MLVA VARIANT BlnI PFGE VARIANT Relatively clonal short lasting point source outbreak

Probing Diversity within a Clonal Point Source O157 Outbreak MN Daycare A T G C Xba-Bln variant MLVA variant Bln-variant Reference Sample: 2011EL-2090_O157:H7_EXHX010589_EXHA261376_main MLVA The total number of SNPs found across all the samples from this outbreak at 100% frequency is 39

Probing Diversity within Clonal Long Lasting O157:H7 Outbreak (2004-2005) PFGE-XbaI PFGE-BlnI ME 05M-MIS-0434_3 MI 05PF000002_2 NY BAC0400012738_3 NY BAC0400016061_3 OH 2005305053_3 OH 2005306272_3 OK 05OKE0971_3 PA 05E00212_3 PA 05E00525_3 VA 04-1576_3 110 2-enzyme matches in 20 states within 15 months Rare PFGE type An indistinguishable new MLVA pattern Vehicle never identified

Probing Diversity within a Clonal Long Lasting O157 Outbreak Not resolved A T G C Reference Sample: K1792_O157:H7_EXHX010086_EXHA260576 The total number of SNPs found across all the samples from this outbreak at 100% frequency is 68

90 95 100 Probing Diversity within Polyclonal Outbreak O111:NM in OK Restaurant (2008) PFGE-BlnI+PFGE-XbaI XbaI & BlnI PFGE-XbaI PFGE-BlnI Key Serotype PFGE-XbaI-pattern PFGE-BlnI-pattern OK 08OKE1082A E coli O111:NM EXDX010050 EXDA260174 OK 08OKE1248 OK 08OKE1296 E coli O111:NM E coli O111:NM EXDX010050 EXDX010050 EXDA260174 EXDA260174 OK 08OKE1310 OK 08OKE1314 E coli O111:NM E coli O111:NM EXDX010050 EXDX010050 EXDA260174 EXDA260174 OK 08OKE1079B OK 08OKE1244 E coli O111:NM E coli O111:NM EXDX010175 EXDX010337 EXDA260174 EXDA260190 OK 08OKE1115 OK 08OKE1265 E coli O111:NM E coli O111:NM EXDX010327 EXDX010320 EXDA260183 EXDA260177 OK 08OKE1308 E coli O111:NM EXDX010334 EXDA260043 1 Xba variant, 4 Xba-Blnvariants Vehicle not identified

Probing Diversity within Polyclonal O111 Outbreak OK Restaurant A T G C N Reference Sample: K6722_O111:IsolatetoCDC_EXDX010050_EXDA260174 The total number of SNPs found across all the samples from this outbreak at 100% frequency is 947

Probing Diversity During a Carrier State PFGE-XbaI PFGE-BlnI KOLabID SPIRU Ocdc_id CD 2011EL-2103 CD 2011EL-2104, 2010C-4979C1 CD 2011EL-2105 CD 2011EL-2106 CD 2011EL-2107 CD 2011EL-2108 CD 2011EL-2109 CD 2011EL-2111 CD 2011EL-2112 CD 2011EL-2113 CD 2011EL-2114 Bln variant Collected from the same person within a period of 2½ months An indistinguishable MLVA pattern

Probing Diversity during a Carrier State Serial Isolates from the Same Person A T G C N Bln variant Reference Sample: 2011EL-2103_O157:H7_EXHX010272_EXHA260195 The total number of SNPs found across all the samples from this outbreak at 100% frequency is 108

K-mer based Phylogenetic Tree for STEC O157 PFGE Outbreak 1 (Not solved, 2004-05) Outbreak 2 (Daycare, 2008) 93 to 2101 SNPs compared to the reference Sakai Outbreak 3 (Topps ground beef, 2007) Long term carrier patient

K-mer -based Phylogenetic Tree for STEC O111 11128 Restaurant outbreak, 2008 195 to 1229 SNPs compared to reference

K-mer -based Phylogenetic Tree for STEC O103 12009 1789 to 20,094 SNPs compared to the reference

Conclusions Based on the Illumina Data K-mer based clustering appears to have good correlation with epidemiological and PFGE data Variation within a (relatively) clonal outbreak < 100 SNPs The variation between unrelated strains of the same serotype hundreds to ~ 2000 SNPs The variation between serotypes > 10,000 SNPs

Future Plans Improve the assemblies using PacBio sequences Make the draft genomes publicly available Additional sequencing to close reference genomes Define a cross serotype gene set to be used to determine strain subtype, virulence and antimicrobial resistance profiles ( super MLST) Presence / absence SNPs Add prospective strains from outbreaks to the database

Questions to be Answered Which clustering method will work for surveillance? 1 K-mer based clustering as a primary tool with gene-based super MLST as a secondary tool OR 2 Gene-based super MLST as a primary tool How to standardize data across the platforms? Different error profiles How to get an actionable report out from raw reads?

100K Pathogen Genome Sequencing Project Dilemma: the safety and security of the world food supply is hindered by lack of food-related genomes Partnership between BGI and UC Davis http://100kgenomevetmeducdavisedu Goal: sequence 100,000 foodborne pathogen isolates within 5 years Sequences will be made publicly available Steering committee BGI UC Davis School of Veterinary Medicine FDA CDC Agilent Technologies, Inc Mars, Inc

100K Pathogen Genome Sequencing Project (cont d) Isolate priority list Tier 1 Tier 2 Tier 3 Salmonella Yersinia Toxigenic bacilli Campylobacter Shigella Norovirus E coli Clostridium Hepatitis Vibrio Listeria Enterococcus Cronobacter Rotovirus Seeking isolates from diverse types of food, distinct environments, diverse regions of the world and longitudinal isolate collections

Acknowledgements CDC/EDLB Peter Gerner-Schmidt John Besser Efrain Ribot Nancy Strockbine Cheryl Tarr Lee Katz Amber Schmidtke Ashley Sabol Haley Martin Devon Stripling MN Dept of Health Dave Boxrud CDC Bioinformatics Core Group Duncan MacCannell Ryan Weil Shankar Changayil Satishkumar Ranganathan Kun Zhao Expression Analysis Stephen Sifed Victor Weigman Steve Mcphail

Thank You for Your Attention! Questions? For more information please contact Centers for Disease Control and Prevention 1600 Clifton Road NE, Atlanta, GA 30333 Telephone: 1-800-CDC-INFO (232-4636)/TTY: 1-888-232-6348 E-mail: cdcinfo@cdcgov Web: http://wwwcdcgov The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention National Center for Emerging and Zoonotic Infectious Diseases Division of Foodborne, Waterborne and Environmental Diseases