a. From the grey navigation bar, mouse over Analyze & Visualize and click Annotate Nucleotide Sequences.

Similar documents
Module 3. Genomic data and annotations in public databases Exercises Custom sequence annotation

PROTOCOL FOR INFLUENZA A VIRUS GLOBAL SWINE H1 CLADE CLASSIFICATION

Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses. Objective

Section D. Identification of serotype-specific amino acid positions in DENV NS1. Objective

Influenza H3N2 Virus Variation Analysis

SEQUENCE FEATURE VARIANT TYPES

Rotavirus Genotyping and Enhanced Annotation in the Virus Pathogen Resource (ViPR) Yun Zhang J. Craig Venter Institute ASV 2016 June 19, 2016

The Influenza Virus Resource at the National Center for Biotechnology Information. Running Title: NCBI Influenza Virus Resource ACCEPTED

Influenza Virus HA Subtype Numbering Conversion Tool and the Identification of Candidate Cross-Reactive Immune Epitopes

Study the Evolution of the Avian Influenza Virus

Appendix 81. From OPENFLU to OPENFMD. Open Session of the EuFMD: 2012, Jerez de la Frontera, Spain 1. Conclusions and recommendations

Objective. Background

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein

SMPD 287 Spring 2015 Bioinformatics in Medical Product Development. Final Examination

OneTouch Reveal Web Application. User Manual for Healthcare Professionals Instructions for Use

ATLANTIS WebOrder. ATLANTIS ISUS User guide

Bioinformatics Laboratory Exercise

Self Assessment 8.3 to 8.4.x

To begin using the Nutrients feature, visibility of the Modules must be turned on by a MICROS Account Manager.

Section B. Comparative Genomics Analysis of 2013 H7N9 Influenza A Viruses. Objective

Student Handout Bioinformatics

Creating YouTube Captioning

Fondation Merieux J Craig Venter Institute Bioinformatics Workshop. December 5 8, 2017

RNA Secondary Structures: A Case Study on Viruses Bioinformatics Senior Project John Acampado Under the guidance of Dr. Jason Wang

PedCath IMPACT User s Guide

AudioConsole. User Guide. Doc. No EN/01 Part No EN

User Instruction Guide

Section B. Comparative Genomics Analysis of 2013 H7N9 Influenza A Viruses. Objective

Sleep Apnea Therapy Software Clinician Manual

Data mining with Ensembl Biomart. Stéphanie Le Gras

Making a Room Reservation with Service Requests in Virtual EMS

1. Automatically create Flu Shot encounters in AHLTA in 2 mouse clicks. 2. Ensure accurate DX and CPT codes used for every encounter, every time.

Hands-On Ten The BRCA1 Gene and Protein

Aggregate Report Instructions

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Fully Automated IFA Processor LIS User Manual

Module 3: Pathway and Drug Development

HOST. Flat- Faced Bat Horse Human Lab Large Cat Mule Muskrat Panda Pika Plateau Pika Raccoon Dog Reassortant Sea Mammal Sloth Swine Unknown Weasel

CaseBuilder - Quick Reference Guide

Medtech Training Guide

Data Management System (DMS) User Guide

ShadeVision v Color Map

Reporting Nonviable MDH Vaccine to MIIC

TMWSuite. DAT Interactive interface

Corporate Online. Using Term Deposits

User Guide for Classification of Diabetes: A search tool for identifying miscoded, misclassified or misdiagnosed patients

User Guide. Association analysis. Input

Proteome Discoverer Version 1.3

Data Management System (DMS) User Guide

BORN Ontario: NSO DERF Training Guide FEBRUARY 2012

Automated process to create snapshot reports based on the 2016 Murray Community-based Groups Capacity Survey: User Guide Report No.

Agile Product Lifecycle Management for Process

Care Pathways User Guide

For all of the following, you will have to use this website to determine the answers:

Content Part 2 Users manual... 4

LiteLink mini USB. Diatransfer 2

Project PRACE 1IP, WP7.4

Term Definition Example Amino Acids

WORKS

Lionbridge Connector for Hybris. User Guide

OECD QSAR Toolbox v.4.2. An example illustrating RAAF scenario 6 and related assessment elements

VIRUS PATHOGEN RESOURCE & INFLUENZA RESEARCH DATABASE EXERCISES

THE AFIX PRODUCT TRAINING MANUAL

IBRIDGE 1.0 USER MANUAL

Qualys PC/SCAP Auditor

The Hospital Anxiety and Depression Scale Guidance and Information

User Guide. Protein Clpper. Statistical scoring of protease cleavage sites. 1. Introduction Protein Clpper Analysis Procedure...

Sleep Apnea Therapy Software User Manual

Training Peaks P90X and P90X2 Workout Schedule Instructions

MyWindFit Member Analytics Portal

Clay Tablet Connector for hybris. User Guide. Version 1.5.0

WORKS

ProScript User Guide. Pharmacy Access Medicines Manager

Add_A_Class_with_Class_Search_Revised Thursday, March 18, 2010

Origins and evolutionary genomics of the novel avian-origin H7N9 influenza A virus in China: Early findings

Instructor Guide to EHR Go

Managing Immunizations

Medtech32 Diabetes Get Checked II Advanced Form Release Notes

Documenting Patient Immunization. New Brunswick 2018/19

Steps to Creating a New Workout Program

Allergy Basics. This handout describes the process for adding and removing allergies from a patient s chart.

Installing and Testing JMonkeyEngine (jme)

TEST-TAKER PROCESS. Create an account Schedule an exam Proctored appointment

TEST-TAKER PROCESS. Create an account Schedule an exam Connecting to the proctor.

Phylogenetic Tree Practical Problems

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

Documenting Patient Immunization. Ontario 2018/19

Utilization of NCBI Pathogen Detection Tool in USDA FSIS

BlueBayCT - Warfarin User Guide

BREEAM In-Use International 2015 Client User Guide

Getting Started.

A Graphic User Interface for the Glyco-MGrid Portal System for Collaboration among Scientists

COMPLETING ONLINE EHS COURSES

USER GUIDE: NEW CIR APP. Technician User Guide

Data Management System (DMS) User Guide

P-B-54.30/141. Instrument Cluster SCN Coding for Component Replacement or Dealer Installed Accessories:

USING THE WEBEX Q&A FEATURE

SANAKO Lab 100 STS USER GUIDE

Exercise Pro Getting Started Guide

Transcription:

Section D. Custom sequence annotation After this exercise you should be able to use the annotation pipelines provided by the Influenza Research Database (IRD) and Virus Pathogen Resource (ViPR) to annotate your own virus genome sequences. I. Annotate influenza virus segment sequences For this exercise, you will use IRD (http://www.fludb.org) to annotate an influenza virus segment sequence. 1. Influenza virus segment sequence annotation You can annotate your influenza sequences using IRD s unique sequence curation/annotation pipeline, which will determine the influenza type, segment number, subtype (if appropriate), and translated amino acid sequence(s) for each segment submitted. a. From the grey navigation bar, mouse over Analyze & Visualize and click Annotate Nucleotide Sequences. b. If you have your own sequence, prepare the sequence in FASTA format, save it in plain text and use.fasta as the file extension. FASTA file example: >gb:am911100 Organism:Influenza A virus A/Anas acuta/slovenia/470/06 Segment:4 Subtype:H5N1 Host:Northern Pintail AGCAAAAGCAGGGGTTCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTT AAAAGTGATCAGATTTGCATTGGTTACCATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGGAAA Otherwise, use a sample sequence from: http://tinyurl.com/p8leagk c. Either paste your sequence in FASTA format to the sequence box or upload your FASTA sequence. Provide your email address so that IRD can contact you if there are problems in the annotation process. Click Validate Sequence(s) to start the annotation process. ome Annotate Nucleotide Sequences Annotate Nucleotide Sequences This tool is an interactive version of the IRD Team's influenza annotation pipeline. You can submit any number of nucleotide sequences in FASTA format for validation and annotation. The pipeline will align the sequences against a consensus sequence profile to identify possible sequencing errors, determine the influenza type, segment number, and for segments 4 and 6 of type A, the subtype, and translate the nucleotide sequence. You will be provided with a report containing these results and a list of possible sequencing errors or, if critical errors are encountered, a description of the errors. This pipeline was developed and is maintained in collaboration the IRD team at Los Alamos National Laboratory, headed by Dr. Catherine Macken. Either paste or upload segments in FASTA format below and click the Validate Sequence(s) button. Note: An asterisk (*) = required field Submitter Email * yun.zhang@jcvi.org E-mail Notice when annotation Completed? Upload Your Segment Data * Enter your sequence in FASTA >gb:am911100 Organism:Influenza A virus A/Anas acuta/slovenia/470/06 Segment:4 Subtype:H5N1 Host:Northern Pintail AGCAAAAGCAGGGGTTCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTT AAAAGTGATCAGATTTGCATTGGTTACCATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGGAAA AGAACGTCACTGTTACACACGCCCAAGACATACTGGAAAAGACACACAACGGGAAACTCTGCGATCTAGA TGGAGTGAAGCCTCTAATTTTAAGAGATTGTAGTGTAGCTGGATGGCTCCTCGGGAACCCAATGTGTGAC GAATTCCTCAATGTGCCGGAATGGTCTTACATAGTGGAGAAGATCAATCCAGCCAATGACCTCTGTTACC CAGGGAATTTCAACGACTATGAAGAACTGAAACACCTATTGAGCAGAATAAACCATTTTGAGAAAATTCA GATCATCCCCAAAAGTTCTTGGTCAGATCATGAAGCCTCATCAGGGGTGAGCTCAGCATGTCCATACCAG or Upload your sequence in FASTA File Choose File No file chosen Cancel Validate Sequence(s) Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 35

d. After the annotation process is finished, a Sequence Annotation Result page will be loaded. Here you will see flu type, segment number, subtype (if you provided HA or NA sequences), and translated protein sequence. You can also download the annotation report by clicking the Annotation Report button. Sequence Annotation Result YOUR TICKET NUMBER: 28457894511 Select/Deselect All Annotation Summary Report Detailed Annotation Report Click on + to view detailed annotation report. 1 sequences submitted 1 sequences pass auto-curation without having any warning message(s) 0 sequences pass auto-curation with warning message(s) 0 sequences fail auto-curation with error message(s) Annotation Details: >gb:am911100 Organism:Influenza A virus A/Anas acuta/slovenia/470/06 Segment:4 Subtype:H5N1 Host:Northern Pintail Flu Type: A Segment number: 4 Subtype: H5 Sequence Length: 1779 Submitted Sequence: Translated Protein Sequences: AGCAAAAGCAGGGGTTCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTC [full] Protein Name: HA Hemagglutinin CDS Location: 29..1735 MEKIVLLLAIVSLVKSDQICIGYHANNSTEQVDTIMEKNVTVTHAQDILE [full] e. If you would like to deposit influenza sequences in GenBank, you can easily submit sequences to GenBank through the IRD site using IRD s sequence submission utility. To do so, click the Submit Data tab in the grey navigation bar and follow prompts. 2. H5N1 Clade Classification IRD has a Highly Pathogenic H5N1 Clade Classification Tool developed by Dr. Catherine Macken s group at Los Alamos National Laboratory, which can classify the clade of the HA gene of highly pathogenic H5 viruses. The IRD algorithm has been verified as highly accurate (> 99%) for sequences of at least 300 nucleotides of HA1. a. From the grey navigation bar, mouse over Analyze & Visualize and click HPAI H5N1 Clade Classification. b. If you have your own H5 sequence, prepare the sequence in FASTA format, save it in plain text and use.fasta as the file extension. FASTA file example: >gb:am911100 Organism:Influenza A virus A/Anas acuta/slovenia/470/06 Segment:4 Subtype:H5N1 Host:Northern Pintail AGCAAAAGCAGGGGTTCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTT Otherwise, use a sample sequence from: http://tinyurl.com/p8leagk c. Either paste your sequence in FASTA format to the sequence box or upload your FASTA sequence. Click Run to proceed. 36 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

Highly Pathogenic H5N1 Clade Classification Tool The IRD team has implemented an algorithm for classifying the clade of the hemagglutinin gene of influenza A viruses whose HA belongs to the A/goose/Guangdong/1/96 (H5N1) lineage, that is, the HA lineage of the so-called highly pathogenic H5 viruses. This algorithm was developed by IRD team member Catherine Macken, of Los Alamos National Laboratory. It uses phylogenetic analysis to place HA (H5) sequences within the WHO classification scheme presented here. The IRD algorithm has been verified as highly accurate (> 99%) for sequences of at least 300 nucleotides of HA1. See SOP for more details. This tool only handles segment 4 sequences with confirmed H5 serotype and lengths greater than 300 nucleotides. Sequences from other serotypes of HA, or other segments will yield unpredictable and likely incorrect results. If unsure of your sequence's segment or serotype, we suggest you use the IRD Sequence Annotation Tool found on the Analyze and Visualize menu by clicking the Annotate Nucleotide Sequences link. ANALYSIS NAME INPUT SEQUENCES Upload a file containing my sequences in FASTA format. File Path: Choose File AM911100.fasta Paste sequences in FASTA format. Clear Run d. After the annotation process is finished, a H5N1 Clade Classification Report page will be loaded. Here you will see the clade assigned to your input sequence. You can download the report by clicking the Download Raw Result button. Home H5N1 Clade Classification Results H5N1 Clade Classification Report Save Analysis Download Raw Result Sequence Identifier gb:am911100 Organism:Influenza_A_virus_A/Anas_acuta/Slovenia/470/06 Segment:4 Subtype:H5N1 Host:Nort ern_pintail 2.2.1 Clade Assignment II. Annotate Hepatitis C Virus (HCV) genome sequences For this exercise, you will use ViPR (www.viprbrc.org) to annotate a Hepatitis C Virus genome sequence, determine its genotype and identify sites of recombination if applicable. ViPR provides a Genome Annotator (GATU) to help you annotate your own virus genome sequences. To use GATU, you will need to select a previously annotated reference sequence and then use GATU to transfer the annotations to a target genome sequence. 1. Annotating an HCV genome sequence a. Go to www.viprbrc.org and click Flaviviridae to get to the family page. b. Mouse-over the Analyze & Visualize tab from the grey navigation bar and click Genome Annotator (GATU). c. In order to annotate your own sequence, you need to select a previously annotated reference sequence. If you already have an annotated reference sequence in.gb format, click Launch GATU to proceed directly to launch GATU. If not, you can use ViPR BLAST to search for a closely-related annotated sequence as your reference. Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 37

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH VIRUS FAMILIES HOME Flaviviridae Genome Annotator (GATU) GATU, a Genome Annotation Transfer Utility (Tcherepanov, et al., BMC Genomics 2006, 7:150 PubMed: 16772042) is an initial-stage tool to transfer annotations from a previously annotated reference to a new, closely-related target genome. ViPR users should ensure that their system has Java 1.6 or higher. The GATU interface provides controls for uploading a reference.gb file of the relevant viral family, along with the target genome in.gb or Fasta format. When done, a table summarizes the similarities of transferred annotations and provides users with checkbox control over which to accept. GATU also detects ORFs in the target and bioinformatics tools to assess if these shoul be annotated. The annotated target genome can be saved in multiple file formats. Originally developed at the University of Victoria, GATU was adapted for use with ViPR. REFERENCE SEQUENCE To use GATU, you will need to select a reference sequence. If you already have an appropriate GenBank file, proceed directly to Launch GATU. If not, you can use ViPR Blast to search for one. Browse to your target sequence in a FASTA format, then click on Go. Run a Blast search, pick a reference sequence file, and download it to your directory in GenBank format. Then click Launch GATU and upload your reference and target sequences using the respective controls. File Path: /Users/yzhang/Downloads/sequence.fasta Go Launch GATU Browse8 Launch GATU directly if you have a reference sequence in.gb format. Use your target sequence to BLAST for a closelyrelated annotated sequence as your reference. i. If you have your own sequence, prepare the sequence in FASTA format, save it in plain text and use.fasta as the file extension. FASTA file example: >gi 375127704 gb JN714194.1 Hepatitis C virus subtype 3a isolate RASILBS2-SR-PO polyprotein gene, complete cds AGATACCTGCCTCTTACGAGGCGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGG AAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGGACCCCCCCTCCCGGGAGA Otherwise, you can use a sample sequence from: http://tinyurl.com/ou7lpcr ii. iii. Click Browse, find the target sequence file on your computer, and click Go to run a BLAST search again annotated HCV reference sequences in ViPR. After BLAST is finished, a list of recommended reference sequences will be displayed. Choose a closely-related sequence and download its GenBank file to your computer. GATU GATU, a Genome Annotation Transfer Utility (Tcherepanov, et al., BMC Genomics 2006, 7:150 PubMed: 16772042) is an initial-stage tool to transfer annotations from a previously annotated reference to a new, closely-related target genome. ViPR users should ensure that their system has Java 1.6 or higher. The GATU interface provides controls for uploading a reference.gb file of the relevant viral family, along with the target genome in.gb or Fasta format. When done, a table summarizes the similarities of transferred annotations and provides users with checkbox control over which to accept. GATU also detects ORFs in the target and bioinformatics tools to assess if these should be annotated. The annotated target genome can be saved in multiple file formats. Originally developed at the University of Victoria, GATU was adapted for use with ViPR. REFERENCE SEQUENCE Here are some recommended Reference Sequences. Select one, click the link Download GenBank File, and save the file to your local machine. Now click Launch GATU. Under Genome Selection > Reference Genome, click Upload Genome File and browse to the saved Reference. Download GenBank File Sequence header Bit Score E Value EXT445955 Hepatitis C virus genotype 9587 0.0 >gi 445955 Country: 3, genome. gb 157781216 EXT383780 >gi 383780 Country: Hepatitis C virus genotype 1, complete genome. gb 22129792 896 0.0 EXT446165 >gi 446165 Country: Hepatitis C virus genotype 2, complete genome. gb 157781212 767 0.0 EXT446144 >gi 446144 Country: Hepatitis C virus genotype 4, genome. gb 157781208 755 0.0 EXT445982 >gi 445982 Country: Hepatitis C virus genotype 6, complete genome. gb 157781214 737 0.0 Launch GATU d. Now, click Launch GATU to run the GATU application. A dialog box will pop up. Click Allow to allow the GATU applet to be loaded on your computer. e. In the GATU window, upload your.gb file as the Reference Genome and your target genome FASTA file as the Genome to Annotate. 38 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

f. Click Annotate to execute annotation process. When done, a table is displayed which summarizes the similarities of transferred annotations and provides users with checkbox control over which to accept. g. Click the Save button to save the annotated target genome in multiple file formats: Genbank, EMBL, or XML. 2. HCV genome sequence genotype determination and recombination detection ViPR provides a Genotype Determination and Recombination Detection Tool for Flaviviridae viruses. This tool estimates the most likely genotype for the input sequences and identifies sites of recombination. a. Mouse-over the Analyze & Visualize tab from the grey navigation bar and click Genotype Determination and Recombination Detection. Genotype Determination and Recombination Detection This annotation pipeline takes an alignment of sequences containing at least two representatives from each taxon. This reference alignment is then used to construct a distance-based tree, which is then parsed in order to find the closest relatives for any query sequence using a Branch Indexing method. By incorporating a static window size, this pipeline can also identify any recombinant query sequence. When the analysis is completed, a graphical representation of the score corresponding to the genotype classification for each region of the "sliding window" will be shown. A spreadsheet file with the results will also be available for download. This tool is based on the Genotype Determination Tool developed by Carla Kuiken's group at Los Alamos National Laboratory for the HCV database. (SOP) ANALYSIS NAME SOURCE OF SEQUENCES TO BE ANALYZED * Sequences can also be selected from search results or a working set in your workbench. SELECT SPECIES HCV Upload a file containing my sequences in FASTA format. File Path: Choose File HCV_JF343783.fasta Only 1 sequence is needed. Paste sequences in FASTA format. Use working sets Clear Run Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 39

b. On the Genotype Determination and Recombination Detection landing page, select HCV from the Select Species drop-down list. c. Download a sample HCV sequence file from: http://tinyurl.com/q6ystk4 d. Input your sequence by uploading the FASTA-formatted sequence file or pasting the FASTAformatted sequence in the box. Then click Run. Note that you can also input sequences from a working set saved in your Workbench. e. After the analysis is finished, the Report page will be displayed. Here you can: i. View the predicted genotype and recombination type (if applicable). ii. Download a spreadsheet listing the detailed results of recombination determination. iii. View the genotyping results in graphical format. iv. View the alignment by clicking Visualize Aligned Sequences. In the JMOL alignment window, you will see representative sequences from each taxon selected by ViPR at the top of the window and your query sequence at the bottom. Delete irrelevant reference sequences (genotypes 1, 2c, 2i, 2k, 3-7) by selecting sequences in the window, then rightclicking on the sequences and clicking Hide Sequences. Examine the region of 8000-8400 and see if you can find any recombination event. v. Download or view the phylogenetic tree based on the alignment of your sequence with representative sequences from each taxon selected by ViPR. Genotype Report Save Analysis Run Analysis Genotype Information Whole Genome Genotype prediction: 2a Whole Genome Recombination Type: 2a,2b Genotype The genotype results include a tab separated file listing the sequence name, a single consensus genotype result for the entire genome, and the confidence metric. Download Recombination This is an excel spreadsheet listing the results for all windows for the sequence. Download Genotyping results in graphical format Alignment This is the multiple sequence alignment of your sequence with a ViPR reference sequence alignment that consists of at least 2 representatives from each taxon Download Aligned Fasta Visualize Aligned Sequences Tree This is the tree generated by PAUP based on the input alignment for the whole genome Download Newick File View phylogentic tree 40 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License