Fondation Merieux J Craig Venter Institute Bioinformatics Workshop. December 5 8, 2017

Similar documents
SEQUENCE FEATURE VARIANT TYPES

Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses. Objective

Section D. Identification of serotype-specific amino acid positions in DENV NS1. Objective

Rotavirus Genotyping and Enhanced Annotation in the Virus Pathogen Resource (ViPR) Yun Zhang J. Craig Venter Institute ASV 2016 June 19, 2016

Module 3. Genomic data and annotations in public databases Exercises Custom sequence annotation

Influenza Virus HA Subtype Numbering Conversion Tool and the Identification of Candidate Cross-Reactive Immune Epitopes

Influenza H3N2 Virus Variation Analysis

PROTOCOL FOR INFLUENZA A VIRUS GLOBAL SWINE H1 CLADE CLASSIFICATION

a. From the grey navigation bar, mouse over Analyze & Visualize and click Annotate Nucleotide Sequences.

Evolution of influenza

Modeling the Antigenic Evolution of Influenza Viruses from Sequences

Origins and evolutionary genomics of the novel avian-origin H7N9 influenza A virus in China: Early findings

EVOLUTIONARY TRAJECTORY ANALYSIS: RECENT ENHANCEMENTS. R. Burke Squires

Data mining with Ensembl Biomart. Stéphanie Le Gras

The Immune Epitope Database Analysis Resource: MHC class I peptide binding predictions. Edita Karosiene, Ph.D.

Section B. Comparative Genomics Analysis of 2013 H7N9 Influenza A Viruses. Objective

Lecture 19 Evolution and human health

Bioinformation Volume 5

Cristina Cassetti, Ph.D.

Phylogenetic Methods

It is well known that some pathogenic microbes undergo

From Mosquitos to Humans: Genetic evolution of Zika Virus

Section B. Comparative Genomics Analysis of 2013 H7N9 Influenza A Viruses. Objective

Mapping the Antigenic and Genetic Evolution of Influenza Virus

Influenza Global Epidemiologic Update

LESSON 4.5 WORKBOOK. How do viruses adapt Antigenic shift and drift and the flu pandemic

Comparative genomics analysis of influenza H7N9 viruses

Appendix 81. From OPENFLU to OPENFMD. Open Session of the EuFMD: 2012, Jerez de la Frontera, Spain 1. Conclusions and recommendations

Multiple sequence alignment

Bjoern Peters La Jolla Institute for Allergy and Immunology Buenos Aires, Oct 31, 2012

Following virus recombination and evolution

Objective. Background

An Evolutionary Story about HIV

Current Vaccines: Progress & Challenges. Influenza Vaccine what are the challenges?

TITLE: Influenza A (H7N9) virus evolution: Which genetic mutations are antigenically important?

SMPD 287 Spring 2015 Bioinformatics in Medical Product Development. Final Examination

Patterns of hemagglutinin evolution and the epidemiology of influenza

Selection of epitope-based vaccine targets of HCV genotype 1 of Asian origin: a systematic in silico approach

Patricia Fitzgerald-Bocarsly

Summary. Week 13/2018 (26 31 March 2018) season overview

Influenza A virus subtype H5N1

Automated Quantification and Description of the Evolutionary Patterns of Influenza Viruses in U.S. Swine

Rajesh Kannangai Phone: ; Fax: ; *Corresponding author

Protein Structure and Computational Biology, Good morning and welcome!

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein

Host Genomics of HIV-1

Hands-On Ten The BRCA1 Gene and Protein

Broadly protective influenza vaccines for pandemic preparedness. Suresh Mittal Department of Comparative Pathobiology Purdue University

Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise!

Protein Structure and Computational Biology, Programme. Programme. Good morning and welcome! Introduction to the course

Summary. Week 15/2018 (9 15 April 2018) season overview

Flu, Avian Flu and emerging aspects (H1N1 resistance)

Supplementary Figure 1. Prevalence of U539C and G540A nucleotide and E172K amino acid substitutions among H9N2 viruses. Full-length H9N2 NS

Evolutionary interactions between haemagglutinin and neuraminidase in avian influenza

Ebola Virus. Emerging Diseases. Biosciences in the 21 st Century Dr. Amber Rice December 4, 2017

Lecture 18 Evolution and human health

Influenza Genome Sequencing Project Proposal

Influenza Sequence Feature Variant Type (Flu-SFVT) analysis: evidence for a role of

Influenza Viruses A Review

Palindromes drive the re-assortment in Influenza A

INFLUENZA-2 Avian Influenza

In April 2009, a new strain of

Min Levine, Ph. D. Influenza Division US Centers for Disease Control and Prevention. June 18, 2015 NIBSC

SUPPLEMENTARY INFORMATION

C E E Z A D. Rational Development of Influenza Vaccines: NDV-based influenza vaccines for poultry and livestock

Avian Influenza: Armageddon or Hype? Bryan E. Bledsoe, DO, FACEP The George Washington University Medical Center

Mina John Institute for Immunology and Infectious Diseases Royal Perth Hospital & Murdoch University Perth, Australia

Protein Modeling Event

Computational Analysis and Visualization of the Evolution of Influenza Virus

Module 3: Pathway and Drug Development

Summary. Week 11/2017 (13 19 March 2017) Season overview

The Influenza Virus Resource at the National Center for Biotechnology Information. Running Title: NCBI Influenza Virus Resource ACCEPTED

Influenza; tracking an emerging pathogen by popularity of Google Searches

Avian influenza Avian influenza ("bird flu") and the significance of its transmission to humans

Alper Sarikaya 1, Michael Correll 2, Jorge M. Dinis 1, David H. O Connor 1,3, and Michael Gleicher 1

Pandemic Preparedness

NEXT GENERATION SEQUENCING OPENS NEW VIEWS ON VIRUS EVOLUTION AND EPIDEMIOLOGY. 16th International WAVLD symposium, 10th OIE Seminar

Observations on Influenza Viri Phylogony. 02/04/07 Bioinformatics: Hatem Nassrat

Lecture 11. Immunology and disease: parasite antigenic diversity

Emergence and Fixing of Antiviral Resistance in Influenza A Via Recombination and Hitch Hiking. Henry L Niman

ph1n1 H3N2: A Novel Influenza Virus Reassortment

Emerging Diseases. Biosciences in the 21 st Century Dr. Amber Rice October 26, 2012

Nanoparticulate Vaccine Design: The VesiVax System

Going Nowhere Fast: Lentivirus genetic sequence evolution does not correlate with phenotypic evolution.

CNV PCA Search Tutorial

Chapter 19: The Genetics of Viruses and Bacteria

Cover Page. The handle holds various files of this Leiden University dissertation

Introduction to Avian Influenza

LESSON 4.4 WORKBOOK. How viruses make us sick: Viral Replication

Public health relevant virological features of Influenza A(H7N9) causing human infection in China

Characterizing intra-host influenza virus populations to predict emergence

Agricultural Outlook Forum Presented: February 16, 2006 THE CURRENT STATE OF SCIENCE ON AVIAN INFLUENZA

HIV-1 acute infection: evidence for selection?

Synthetic Genomics and Its Application to Viral Infectious Diseases. Timothy Stockwell (JCVI) David Wentworth (JCVI)

Summary. Week 4/2018 (22 28 January 2018) season overview

Avian Influenza Virus H7N9. Dr. Di Liu Network Information Center Institute of Microbiology Chinese Academy of Sciences

Antigen Presentation to T lymphocytes

VIROLOGY OF INFLUENZA. Subtypes: A - Causes outbreak B - Causes outbreaks C - Does not cause outbreaks

What is influenza virus? 13,000 base RNA genome: 1/ the size of the human genome

Originally published as:

Transcription:

Fondation Merieux J Craig Venter Institute Bioinformatics Workshop December 5 8, 2017

Module 5: Comparative Genomics Analysis

Outline Definition of comparative genomics Applications in studying human pathogens Analysis tools Case studies

Comparative Genomics comparing genomes of different species for illuminating evolutionary mechanisms and forces for informing the understanding of the human genome conserved vs unique characteristics

A comparison of general features of genomes Touchman, J. (2010) Comparative Genomics. Nature Education Knowledge 3(10):13

A comparison of chromosomal changes Conserved segments in the human and mouse genome Human chromosomes, with segments containing at least two genes whose order is conserved in the mouse genome as color blocks. Each color corresponds to a particular mouse chromosome. (International Human Genome Sequencing Consortium; Lander, E. S. et al. 2001)

Applications in Studying Human Pathogens how genomic diversity is changing over time whether evolution occurs indigenously determination of the impact of introduced strains versus indigenous evolution on disease outcomes. identify genes experiencing strong positive selection viral determinants to disease severity viral adaptive drivers the role of quasispecies in disease pathogenesis vaccine development is dependent on understanding of genetic diversity universal vaccines specific vaccines: against pathogens closely related to commensal microorganisms

Comparative Genomics Analysis Tools whole genome comparison UCSC Browser: a large collection of genomes Ensembl: eukaryotic genomes offering tools for developing testable hypotheses

Analysis and Visualization Tools

Analysis & Visualization

Private Workbench

Workbench

Tight Integration of Data and Analysis Tools metadata coloring ~730,000 animal surveillance records ~14,000 virus protein structures overlapped with pre-computed SNP scores, Sequence Features systems biology data and analysis ~2000 personal workbench accounts viral protein Sequence Features

Case study: Influenza virus antigenic variations in Sa epitope

Background Antigenic variation allows pathogens to escape from immunity Caton (1982): Sa epitope

Sequence Feature Variant Types (SFVT)

Sequence Features knowledge base of characterized regions: markers used in diagnosis genetic determinants for host range, virus virulence, replication efficacy epitopes structural elements developed to support analysis of characterized regions systematically annotated Sequence Features (SFs) allows you to compare strains at the SF level

5,608 Curated Sequence Features for Influenza Virus Sequence Features of Influenza A & B proteins Influenza A Protein Influenza B Protein Subtype Structural SFs Functional SFs Epitopes Sequence Alterations Total no. of SFs 1 PB2 10 27 533 25 595 2 PB1 5 16 792 13 826 2 PB1 F2 2 6 24 2 34 3 PA 29 9 465 8 511 4 HA All 1541 H1 37 10 517 3 567 H2 7 7 19 1 34 H3 59 8 295 30 392 H4 1 1 H5 14 80 372 8 474 H7 2 4 33 40 H9 1 30 31 H12 1 1 H13 1 1 5 NP 28 23 571 1 623 6 NA All 756 N1 25 16 342 7 422 N2 59 9 205 9 297 N3 4 4 N5 2 2 N6 2 2 N7 1 1 N8 6 6 N9 14 14 7 M1 14 25 331 370 7 M2 12 14 97 8 132 N2 1 1 8 NS1 15 36 100 7 158 8 NS2 3 3 63 69 6 NA 8 8

Sequence Feature Variant Types Tutorial This component of IRD provides data on specific characteristic regions and/or sub regions termed 'Sequence Features' (SF) defined for all influenza virus proteins. The SFs and their metadata are derived from scientific literature and/or public domain databases. Variant types (VT) of SFs are computed by multiple sequence alignments of all relevant protein sequences in IRD. Variant types that carry a mutation(s) that has been experimentally determined to give rise to a phenotype, such as increased virulence are denoted as a Phenotype Variant Type (PVT). These PVT annotations are only available for a subset of the influenza virus subtypes. For more information about these PVT annotations, click here. Note: VT 1 is not always a functional epitope. For more information about using SFVTs, click here. For a detailed description of the development and application of the SFVT approach for the study of influenza virus, please read this scientific article: Noronha JM, Liu M, Squires RB, Pickett BE, Hale BG, Air GM, Galloway SE, Takimoto T, Schmolke M, Hunt V, Klem E, García Sastre A, McGee M, Scheuermann RH. (2012) Influenza Sequence Feature Variant Type (Flu SFVT) analysis: evidence for a role of NS1 in influenza host range restriction. J Virol, 86: 5857 5866. doi: 10.1128/JVI.06901 11. PMID: 22398283 Go to Sequence Feature List Results matching your criteria: 105 VIRUS TYPE A B C SEQUENCE FEATURE TYPE SUBTYPE FOR HA AND NA H1 * Use comma to separate multiple entries. Ex: H1, H3, H7, N1, N2. SELECT SEGMENTS AMINO ACID COORDINATES Start: To: IMMUNE RECOGNITION CONTEXT FOR EPITOPE TARGET MHC CLASS HOST MHC ALLELE Ex. HLA A*02:01, HLA A2, H 2 Kb, Mamu A, SLA 1*04, BoLA DRB, Patr DR KEYWORD SEARCH * Use comma to separate multiple entries. Ex: alpha helix, Beta strand, IEDB:95983. Clear Search

SEQUENCE FEATURE DEFINITION Protein Name Sequence Feature Name Sequence Feature ID Reference Strain Reference Sequence Accession Reference Position HA Influenza A_H1_antigenic site Sa_141(9) Influenza A_H1_SF42 A/California/04/2009(H1N1) FJ966082 141(124 HA1),142(125 HA1),172(155 HA1),174(157 HA1),176(159 HA1),177(160 HA1),179(162 HA1),180(163 HA1),181(164 HA1) SOURCE STRAIN(S) Source Strain A/Puerto Rico/8/34(H1N1) VT Number Source Position Source Accession 3D Protein Structure VT 22 141,142,171,173,175,176,178,179,180 CY033577 3AL4, 3LZG, 3UBE, 3UBJ, 3UBN, 3UBQ, 3UYW, 3UYX, 3ZTN, 4JTV, 4JTX, 4JU0, 4M4Y Publication Epitope Type Evidence Codes PubMed:6186384 B Cell EXP P141, N142, E171, S173, P175, K176, K178, N179, S180 Epitope Sequence Comment Sites Sa and Sb are in the upper part of the HA1 globular head. Sa occurs in the front region. VARIANT TYPES VARIANT Excel Download TYPES FASTA Download View Phylogenetic Tree Find a VT(s) There Excel are 424 Download variant types, but FASTA only 100 Download with the highest strain View counts Phylogenetic are displayed. Tree Find a VT(s) There are 424 variant types, but only 100 with the highest strain counts are displayed. Sequence Variation Strain Count Variant Type Strain 16028 Count Variant VT 1 Type 6600 VT 2 16028 VT 1 1378 VT 3 6600 VT 2 421 VT 4 141 142 172 174Sequence 176 Variation 177 179 180 181 141 P 142 N 172 G 174 S 176 P 177 K 179 S 180 K 181 S N L N P N G S P K S K S N N L N D Total Variations Total Variations 0 3 0 1 3 1

Variations in the Sa epitope VARIANT TYPES Excel Download Find a VT(s) Phylogenetic tree view disabled because there are not enough variant types to generate the tree. There are 424 variant types, but only 421 with the highest strain counts are displayed. Edit specific positions in this VT 1 sequence with IUPAC symbols or use "?" as a wild card. If necessary, use the horizontal scroll bar to access the entire SF. Click Search to find VT(s) conforming to the edited sequence. Click Reset to restore this panel to the default VT 1 sequence. Search Reset Fill wildcards 141? 142? 172? Enter Sequence Variation to Find 174? 176? 177? 179? 180 Q 181? Sequence Variation Strain Count Variant Type 16028 VT 1 1702 VT 9 2011 VT 21 5 VT 83 141 142 172 174 176 177 179 180 181 P N G S P K S K S Q N Q N L N Q Total Variations 0 1 2 4

Sa epitope polymorphism In 2009-2013 flu seasons VT-1 is the dominant VT some have VT9 In 2013-2014 flu season VT-9 is the dominant VT High immune pressure at 180. 180Q has been positively selected.

Summary SFVT supports comparative analysis of characterized regions at the strain level Sequences change under directional selection, a process whereby natural selection favors a single phenotype and continuously shifts the allele frequency in one direction.

Case study: Influenza H3N2 viruses, Netherlands, 2016-2017

Influenza Cases in Eight European Countries Figure 5. Number of influenza cases admitted to ICUs by type and subtype in eight EU/EEA countries, seasons 2012 2013 to 2016 2017 H3N2: many severe cases Risk-assessment-seasonal-influenza-2016-2017-update.pdf

Questions Explore the evolution of human H3N2 viruses in Netherlands in the past three flu seasons. Predict whether the current vaccine would be protective against the most recent viruses.

Sequence Variation Analysis Workflow Prepare sequence data and metadata Align sequences and check quality Construct a phylogenetic tree and explore phylogeny-metadata association Identify the unique genetic substitutions of the 16-17 viruses Determine if the substitutions are located in antigenic regions

Personal Workbench

Support of user provided sequence data/metadata SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA HELP Upload Data and Associated Metadata UPLOAD DATA Name * H3N2_Netherlands Description File Type * File Path * H3N2_Nether refs.fasta UPLOAD SEQUENCE ASSOCIATED METADATA (OPTIONAL) IRD supports metadata based phylogenetic tree node coloring and metadata based sequence group comparison by meta CATS. If you want to use these functionalities, you need to provide sequence associated metadata by using one of the following options: Option 1: Provide metadata in defline in my uploaded FASTA file Option 2: Provide metadata in a completed metadata template Step 1: Step 2: Step 3: Download Metadata Template Fill out the template according to instruction in comments. Upload Metadata Spreadsheet H3N2_Nether adata.xlsx Cancel Upload

2013-2015 vaccine 3C.1 3C.3a 3C.3b 2016-2017 vaccine 3C.2a 3C.2a1

Comparing 3C.2a1 (2016-2017) with others Metadata-driven Comparative Analysis Report (Ticket# MG_875402859372) Save Analysis Generate Phylogenetic Tree Visualize Aligned Sequences Download The Metadata driven Comparative Analysis Tool (meta CATS) (SOP) consists of three parts: a multiple sequence alignment (using MUSCLE), a chi square test of independence to identify positions (columns) of the multiple sequence alignment that significantly differ from the expected (random) distribution of residues between all metadata groups, and a Pearson's chi square test to identify the specific pairs of metadata groups that contribute to the observed statistical difference. When 3 or more groups are included in the analysis, the P value from the test of independence will identify columns having significant variation between all groups, while the Pearson's test will identify the specific pair(s) of groups that make the column significant (i.e. if those groups were not included in the analysis, the column would no longer be identified as significant). The (View SF) link associates each position with the Sequence Feature (SF) page. Note that this link is visible only if SFs overlap with the protein position exist and if the sequence data being analyzed belongs to the same viral species, type, or subtype. Chi square Test of Independence Result There are 15 positions that have a significant non random distribution between the specified groups. "*" in position column indicates fewer than 5 non zero residues in any cell of the contingency table. Group1: merged group Group2: merged group Position Chi square Value P value Degree Freedom Residue Diversity 172 16.994 7.089E 4 3 group1(12 C) group2(5 T) 234 16.994 7.089E 4 3 group1(12 A) group2(5 G) 531 16.994 7.089E 4 3 group1(12 C) group2(5 T) 561 16.994 7.089E 4 3 group1(12 T) group2(5 G) 1197 14.995 0.001821 3 group1(10 A) group2(5 G) 1264 14.995 0.001821 3 group1(10 A) group2(5 G) 1269 14.995 0.001821 3 group1(10 G) group2(5 A)

Comparing 3C.2a1 (2016-2017) with others 3C.2a1 other 3Cs

Sequence Feature Details (SOP) For more information about using SFVTs, click here. For a detailed description of the development and application of the SFVT approach for the study of influenza virus, please read this scientific article: Noronha JM, Liu M, Squires RB, Pickett BE, Hale BG, Air GM, Galloway SE, Takimoto T, Schmolke M, Hunt V, Klem E, García Sastre A, McGee M, Scheuermann RH. (2012) Influenza Sequence Feature Variant Type (Flu SFVT) analysis: evidence for a role of NS1 in influenza host range restriction. J Virol, 86: 5857 5866. doi: 10.1128/JVI.06901 11. PMID: 22398283 SEQUENCE FEATURE DEFINITION Protein Name Sequence Feature Name Sequence Feature ID Reference Strain Reference Sequence Accession Reference Position HA Influenza A_H3_experimentally determined epitope_421(30) Influenza A_H3_SF240 A/Aichi/2/1968(H3N2) AB284320 421(76 HA2) 450 SOURCE STRAIN(S) Source Strain A/Hong Kong/1/1968(H3N2) VT Number Source Position VT 1 421 450 Source Accession CY044261 3D Protein Structure 1EO8, 1HA0, 1HGG, 1HTM, 1KEN, 1QFU, 1QU1, 2HMG, 2VIU, 2YPG, 3EYM, 3HMG, 3VUN, 4HMG, 5HMG Publication Epitope Type Evidence Codes Epitope Sequence Comment IEDB:132043 B Cell N/A RIQDLEKYVEDTKIDLWSYNAELLVALENQ N/A VARIANT TYPES Excel Download FASTA Download View Phylogenetic Tree Find a VT(s) There are 182 variant types, but only 100 with the highest strain counts are displayed. VT-10 97% from the recent 3 seasons. consistent with Meta-CATS result. Strain Count Variant Type 17701 VT 1 177 VT 2 2 VT 3 62 VT 4 199 VT 5 15 VT 6 56 VT 7 20 VT 8 33 VT 9 702 VT 10 Sequence Variation 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 4Total Variations R I Q D L E K Y V E D T K I D L W S Y N A E V G R D I V N 0 1 30 1 1 2 1 1 1 1

Summary The rapid evolution of the virus and the importance of monitoring whether antigenic drift would alter antigenicity. A/Texas/50/2012, the vaccine strain selected for the 13-14 and 14-15 flu seasons represents clade 3C.1. However, strains circulating in 13-14 and 14-15 fall into new clades of 3C.3a, 3C.3b and 3C.2a. A/Hong Kong/4801/2014, the vaccine strain selected for the 16-17 season belongs to 3C.2a. However, the majority of the 16-17 isolates form a new subclade 3C.2a1 within 3C.2a. Meta-CATS comparison identified 15 substitutions that distinguish the newly emerged 3C.2a1 subclade from the older clades. Further experiments need to be conducted to see whether the substitutions found in the most recent isolates have altered antigenicity.

Case study: Zika viruses in Southeast Asia

Why did not ZIKV viruses in Southeast Asia cause outbreaks and microcephaly on a large scale? recent ZIKV has been circulating since 2010 overlooked or falsely detected due to the fact that ZIKA diagnosis (primers), clinical presentations and managements are the same as DENV cases no report of outbreaks or microcephaly Africa Southeast Asia Pacific Islands South America Potential selective signatures Figure 2. Amino acid residues of the pr peptide that are unique in the South American ZIKV isolates. Model was based on DENV pr peptide (3C5X). Figure 1. Phylogenetic analysis of Southeast Asian and South American ZIKV branches. https://doi.org/10.1016/j.apjtm.2016. 10.002