Linking genetic and epidemiological datasets: the challenges of reconstructing transmission trees for livestock viral diseases Donald King donald.king@pirbright.ac.uk Vesicular Disease Reference Laboratory Group FMD and SVD Reference Laboratories
BBSRC National Virology Centre: The Plowright Building 2015: Occupied new high containment laboratory Houses all work with live FMD and International Reference Laboratories for FMD, BT, PPR, ASF, AHS, Capripox
Foot-and-Mouth Disease Virus Family Picornaviridae, genus Aphthovirus Causes a highly contagious disease of cloven-hoofed livestock Cattle, sheep, goats and pigs + sense ss RNA genome ~8300nt 5 UTR Protease Membrane-binding Genome-linked Carboxy-terminal (VPg) self-cleaving Capsid NTP binding* Protease Polymerase 3 UTR VPG L 1A 1B 1C 1D VP4 VP2 VP3 VP1 2A 2B 2C 3A 3B 3C 3D Poly(C) AAA (n) Primary cleavages L 2A 3C Secondary cleavages 1B/RNA? 3C 3C 3C 3C 3C 3C 0 1 2 3 4 5 6 7 8 Kilobases 7 Serotypes (O; A; C; SAT1; 2; 3 and Asia-1) including numerous subtypes VP1 sequence data widely used for strain characterisation
Why do we sequence FMDV?... Multiple virus serotypes/topotypes/strains Monitoring global patterns of virus distribution Tracing sources of outbreaks (who-infected-who?) Early recognition of the emergence of new lineages Antigenic prediction and vaccine selection
2015: Changing epidemiological patterns O: North Africa Multiple lineages East Asia Outbreaks reported to the OIE (change of epidemiological status): http://www.oie.int/wahid-prod/public.php?page=home
Example: Molecular Epidemiology O/ME-SA/Ind2001 FMD Outbreaks in Libya, Tunisia and Algeria New lineage introduced into North Africa Source: Indian sub-continent Increased onward threats to Morocco and Europe? Knowles et al., (2015) TED 87 90 92 91 88 87 100 95 100 95 99 99 94 99 97 O/1628-1468/Algeria/2014 (IZSLER) O/TUN/1031/2014* (IZSLER) O/1631-FA001/Algeria/2014 (IZSLER) O/TUN/1054/2014* (IZSLER) O/LIB/12/2013 O/LIB/11/2013 O/LIB/1/2013 O/LIB/4/2013 O/LIB/13/2013 O/LIB/5/2013 O/LIB/17/2013 O/LIB/7/2013 O/LIB/22/2013 O/LIB/2/2013 O/LIB/3/2013 O/LIB/6/2013 O/LIB/16/2013 O/IND52/2013* (PD-FMD) 74 86 96 O/SAU/2/2014 O/SAU/3/2014 O/SAU/1/2014 O/SAU/7/2013 O/SAU/8/2013 O/SAU/6/2013 O/IND179/2013* (PD-FMD) O/SAU/4/2014 O/SAU/3/2013 O/SAU/1/2013 O/SAU/4/2013 O/IND127/2013* (PD-FMD) O/NEP/16/2013 O/BHU/1/2013 O/NEP/16/2012 O/IND50/2013* (PD-FMD) O/NEP/10/2012 O/NEP/11/2012 O/NEP/13/2012 86 87 99 91 O/NEP/6/2013 O/NEP/11/2013 O/NEP/12/2013 O/UAE/1/2014 O/UAE/2/2014 O/IND205/2013* (PD-FMD) O/NEP/1/2014 O/NEP/2/2014 O/IND219/2013* (PD-FMD) O/IND222/2013* (PD-FMD) O/NEP/3/2013 O/IND189/2013* (PD-FMD) O/NEP/6/2014 O/BHU/3/2009 O/IND/102/2010* (KC506466) O/UAE/4/2008 O/KUW/3/97 (DQ164904) O/OMN/7/2001 (DQ164941) PanAsia PanAsia-2 O/IND/53/79 (AF292107) O/IND/R2/75* (AF204276) d c a b Ind-2001 ME-SA
Limitations of VP1 sequence data VP1 nucleotide sequences can be used to provide evidence to support transboundary movements of FMDV Useful for regional and country-level epidemiology VP1 not typically useful to resolve transmission trees within outbreak clusters (relationships between infected farms) Can we use full genome sequence data to increase the resolution of analysis and trace the spread of FMDV during an outbreak? Sequencing improvements make rapid full-genome sequencing achievable Sanger methods NGS approaches
Acknowledgements: Caroline Wright Jan Kim Begoña Valdazo- González Kasia Bankowska Antonello Di-nardo Eleanor Cottam Faizah Hamid Guido König Graham Freimanis David King Müge Fırat-Saraç Nick Knowles Jemma Wadsworth Richard Orton Marco Morelli Dan Haydon
Increased number of informative sites Retrospective analysis of the 2001 outbreak in the UK L 2A VP4 VP2 VP3 VP1 2B 2C 3A 3B 3C 3D AAA (n) Cottam et al., (2006), J. Virol Genome position 0 8000 Analysis of 23 complete genome sequences (consensus) 197 sites with nt substitutions
Practical uses of full-genome sequences: Farm-to-farm level resolution (UK 2007) TCS representation of sequences ( ) recovered from farms with putative intermediates ( ) IAH2 MAH IP1b(2) IP1b(1) IP2c IP2b IP5 IP3c IP4b IP6b IP3b IP3b IP8 Expected changes for each IP7 farm-to-farm transmission link: 4.3 ± 2.1 nts for 2001 Cottam et al., (2008) Proc. Roy. Soc. B Discriminates between viruses recovered from infected farms Data was provided rapidly (in real-time) to support UK eradication programme Provided evidence for the existence of IP5 (farm with FMD serology positive cattle and sheep) bridges gap between two phases of the outbreak Cottam et al (2008) PLoS Pathogens
Level of individual-to-individual (cattle) Experimental infection chain Full genome sequences can resolve down to individual transmission events (direct and indirect contact) However, interpretation can be sensitive to the particular sample type analysed (acute vs carrier samples) B1.2D.V B2.6D.V B4.9D.V 1 B2.4D.P B5.9D.V IAH2* 3 B2.2D.P 2 B2.6D.P B3.3D.V B2.32D.P 2 Juleff et al., (2013) J. Gen. Virol.
Frequency Impact of within-herd genetic variation upon inferred transmission trees 45 complete genomes from UK 2007 Challenging data set due to long branch length on one farm Random selection single sequence from each farm 6% of tree topologies were identical 85% of tree topologies differed by only one edge Cost effective approach Valdazo-Gonzalez et al (2015) Infec. Gen. Evol. Distance
New era for sequencing? MiSeq (Illumina) Bench-top platform Brown and Underwood (1982) Charaterisation of Danish and German FMDV isolates using ribonuclease T 1 fingerprinting At The Pirbright Institute, located inside containment 7GB/run (up to 250 nt pairedend reads) High Q-score for data quality
EpiSeq project Epi-Seq aims to exploit NGS technologies to: Generate improved tools for use in real-time monitoring of epidemics Collaborators: Belgium, Germany, UK, Italy and Sweden (and Denmark) Target important RNA/DNA viruses: Causing epidemic disease (FMDV/AIV) Causing endemic disease (CSFV) 2 DNA viruses (ASFV and Poxviruses) Results will bring novel insights into: Field epidemiology: monitor trans-boundary movements Evolutionary ecology: genetic determinants underpinning phenotypic traits
NGS (Illuminia) protocol for FMDV PCR-free protocol o Eliminates requirements for extensive primer panels Can be applied to any RNA virus with a poly (A) tail High coverage suitable for consensus and deep-sequencing Multiplexing (up to 96 samples/run) is possible Logan et al, (2014) BMC Genomics
NGS application: UK 2001 FMD Outbreak First case 20/02/01 Abattoir, Essex >2000 infected farms ~ 7 month period Epidemiological links between local farms are not well understood Sequencing of representative viruses from an archive from ~1500 farms is underway
Generating transmission trees Limitations and challenges Assigning ownership to ancestral nodes cul-de-sacs where sequenced material does not normally represent the material that is transmitted to a down-stream farm Consequence: Sequence-based analysis is compatible with a large number of fine-scale transmission trees Considering a simple TCS tree: Farm B Inferred transmission trees Source Farm A Farm C
Linking datasets Using field-epidemiological data as a framework for the sequence-based trees: Relationship between sequences SEQUENCE DATA FIELD EPI DATA Date of cull Location Number of animals Est. age of oldest lesion Proposed epidemiological links with other cases tracing exercises
Integrating temporal data Reduces number of plausible transmission trees 20 farms from the 2001 epidemic Within a single cluster Transmission trees based on full genome sequence consistent with >41,000 transmission trees Simple Infected and Infectious windows applied 4 trees represent >95% of total likelihood Start of outbreak Ii(t) probability that i th farm was infected at time t (discrete betadistribution) L(k) probability of incubation for K days prior to becoming infectious, gamma-distribution, 95% probability between 2 and 12 days * date of confirmation minus oldest lesion minus 5d incubation Probability that the ith farm is a source of infection at time t : F ( t) i C i I ( j) t j i j 0 k 1 L( k) * Mos infecti (m Cottam et al, (2008) Proc. Roy. Soc. B
Number of nt substitutions Using evolutionary rates? Remarkably consistent substitution rate ~2 x 10-5 nt substitutions/site/day Retrospective analysis of the 1968-9 epidemic in the UK Evolutionary standstill for some viruses indicative of role of fomites (transmission via inanimate objects)? Wright et al., (2014) Infec. Gen. Evol. Outbreak timeline (days)
Bayesian models Reconstructing transmission trees from different datasets Framework leading to a Bayesian inference scheme that combines genetic and epidemiological data Based on a dynamic model of pathogen transmission between source and receptor premises Accommodates spatial (GIS) and temporal data Work in progress! Morelli et al, (2012) PLoS Comp. Biol.
Applications: estimating what we do not see? Assuming: polymerase error rates are clock-like and can be estimated population structures can be modelled Can we use difference between sequence data to model the extent of unsampled sequences ( dark matter ) between samples received for analysis? At the regional scale can this be used as a proxy for FMD prevalence? Calibrated with real data Andrew Rambaut, University of Edinburgh
dark matter (un-sampled sequences) Transmission to another animal Intra-host Regional pathways viral pathways
Summary and future priorities Future will deliver new platforms and increased capacity to generate sequence data Requires close relationships between molecular virologists, bioinformaticians and informaticians Improved pipelines to reduce process-error and/or development of models to accommodate error in our data Approaches to translate genetic relationships into transmission trees (also using epidemiological data) Reliable (statistical) measures of the likelihood of transmission links
Acknowledgements: The FMD Reference Laboratory Nick Juleff David Paton Jan Kim John Hammond Partners on EpiSeq project Work supported by: Photo courtesy of HDR Architecture, Inc.; 2104 James Brittain