Influenza H3N2 Virus Variation Analysis

Similar documents
Section D. Identification of serotype-specific amino acid positions in DENV NS1. Objective

Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses. Objective

a. From the grey navigation bar, mouse over Analyze & Visualize and click Annotate Nucleotide Sequences.

Section B. Comparative Genomics Analysis of 2013 H7N9 Influenza A Viruses. Objective

Module 3. Genomic data and annotations in public databases Exercises Custom sequence annotation

PROTOCOL FOR INFLUENZA A VIRUS GLOBAL SWINE H1 CLADE CLASSIFICATION

Section B. Comparative Genomics Analysis of 2013 H7N9 Influenza A Viruses. Objective

SEQUENCE FEATURE VARIANT TYPES

Objective. Background

Fondation Merieux J Craig Venter Institute Bioinformatics Workshop. December 5 8, 2017

Data Management, Data Management PLUS User Guide

Clay Tablet Connector for hybris. User Guide. Version 1.5.0

ATLANTIS WebOrder. ATLANTIS ISUS User guide

Lionbridge Connector for Hybris. User Guide

Medtech Training Guide

Rotavirus Genotyping and Enhanced Annotation in the Virus Pathogen Resource (ViPR) Yun Zhang J. Craig Venter Institute ASV 2016 June 19, 2016

User Instruction Guide

Hands-On Ten The BRCA1 Gene and Protein

BlueBayCT - Warfarin User Guide

One-Way Independent ANOVA

Module 3: Pathway and Drug Development

Content Part 2 Users manual... 4

Data Management System (DMS) User Guide

Quick-Start Guide TeamUnify, LLC

Using SPSS for Correlation

ShadeVision v Color Map

The Hospital Anxiety and Depression Scale Guidance and Information

Fully Automated IFA Processor LIS User Manual

Sleep Apnea Therapy Software Clinician Manual

To begin using the Nutrients feature, visibility of the Modules must be turned on by a MICROS Account Manager.

IMPORTANT!!! Please read the FAQ document BEFORE you step through this tutorial.

Section 6: Analysing Relationships Between Variables

Improve Your Success with Food Logging in the dotfit Program

Add_A_Class_with_Class_Search_Revised Thursday, March 18, 2010

Add_A_Class_with_Class_Number_Revised Thursday, March 18, 2010

OECD QSAR Toolbox v.4.2. An example illustrating RAAF scenario 6 and related assessment elements

Data Management System (DMS) User Guide

EVOLUTIONARY TRAJECTORY ANALYSIS: RECENT ENHANCEMENTS. R. Burke Squires

In April 2009, a new strain of

Instructor Guide to EHR Go

RESULTS REPORTING MANUAL. Hospital Births Newborn Screening Program June 2016

Data Management System (DMS) User Guide

INRODUCTION TO TREEAGE PRO

Care Pathways User Guide

Phylogenetic Methods

Cerner COMPASS ICD-10 Transition Guide

Mapping of the influenza A hemagglutinin serotypes evolution by the ISSCOR method

USER GUIDE: NEW CIR APP. Technician User Guide

OneTouch Reveal Web Application. User Manual for Healthcare Professionals Instructions for Use

TMWSuite. DAT Interactive interface

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures

PedCath IMPACT User s Guide

CaseBuilder - Quick Reference Guide

Anticoagulation Manager - Getting Started

User Guide V: 3.0, August 2017

EDUCATIONAL TECHNOLOGY MAKING AUDIO AND VIDEO ACCESSIBLE

To open a CMA file > Download and Save file Start CMA Open file from within CMA

Two-Way Independent ANOVA

CNV PCA Search Tutorial

Technical Bulletin. Technical Information for Quidel Molecular Influenza A+B Assay on the Bio-Rad CFX96 Touch

Sleep Apnea Therapy Software User Manual

User Guide for Classification of Diabetes: A search tool for identifying miscoded, misclassified or misdiagnosed patients

Creating YouTube Captioning

Warfarin Help Documentation

GLOOKO FOR ios MIDS USER GUIDE

Quick guide to connectivity and the Interton Sound app

Comparative genomics analysis of influenza H7N9 viruses

Nodule Detection process: 1. Click on the patient study to be loaded 2. Click the drop down arrow next to the analysis button and select LNA

RADAR Report. Self Guided Tutorial

Corporate Online. Using Term Deposits

Bioinformatics Laboratory Exercise

Chronic Pain Management Workflow Getting Started: Wrenching In Assessments into Favorites (do once!)

Agile Product Lifecycle Management for Process

Managing Immunizations

Appendix B. Nodulus Observer XT Instructional Guide. 1. Setting up your project p. 2. a. Observation p. 2. b. Subjects, behaviors and coding p.

Contour Diabetes app User Guide

Actiwatch. Clinician Guide

Dementia Direct Enhanced Service

SMPD 287 Spring 2015 Bioinformatics in Medical Product Development. Final Examination

Download CoCASA Software Application

v Feature Stamping SMS 12.0 Tutorial Prerequisites Requirements TABS model Map Module Mesh Module Scatter Module Time minutes

SHOEBOX Audiometry Pro. Quickstart Guide. SHOEBOX Audiometry Pro

ProScript User Guide. Pharmacy Access Medicines Manager

Training Peaks P90X and P90X2 Workout Schedule Instructions

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Entering HIV Testing Data into EvaluationWeb

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

Origins and evolutionary genomics of the novel avian-origin H7N9 influenza A virus in China: Early findings

Getting Started.

JEFIT ios Manual Version 1.0 USER MANUAL. JEFIT Workout App Version 1.0 ios Device

BREEAM In-Use International 2015 Client User Guide

mpaceline for Peloton Riders User Guide

COMPLETING ONLINE EHS COURSES

Table of Contents. Contour Diabetes App User Guide

1. Automatically create Flu Shot encounters in AHLTA in 2 mouse clicks. 2. Ensure accurate DX and CPT codes used for every encounter, every time.

MNSCREEN TRAINING MANUAL Hospital Births Newborn Screening Program October 2015

Feature Stamping SURFACE WATER MODELING SYSTEM. 1 Introduction. 2 Opening a Background Image

Automated process to create snapshot reports based on the 2016 Murray Community-based Groups Capacity Survey: User Guide Report No.

Release Notes. Medtech Evolution General Practice. Version Build (June 2016)

OneTouch Reveal Web Application. User Manual for Patients Instructions for Use

Transcription:

Influenza H3N2 Virus Variation Analysis Objective Upon completion of this exercise, you will be able to: search for virus sequences and view detailed information about these sequences in IRD, build a phylogenetic tree on a set of sequences to infer their evolutionary relationships, and perform a multiple sequence alignment and Meta-CATS to identify nucleotide or amino acid positions that differ between groups of virus sequences. Background Classical swine-origin H3N2 virus underwent a reassortment event with the M segment from the 2009 H1N1 pandemic (ph1n1) virus, giving rise to H3N2 variant (H3N2v) virus, which infected human. We are interested in finding the M segments from these two types of viruses and then comparing their sequences to identify variant positions using the data and tools provided in the Influenza Research Database. Analysis Workflow Search for sequences and save sequnces into working sets: (1) M1 protein from swine H3N2v virus from flu seasons 2008-2012 (2) M1 protein from "classical" swine H3N2 (non-h3n2v) virus from flu seasons 2008-2012 (3) M1 protein from ph1n1 virus Nucleotide phylogenetic tree: - combine working sets (1)-(3) into one (4) - convert the combined M1 protein working set into M segment nucleotide working set (5) - construct phylogenetic tree using M segment working set (5) which contains M segment sequences from H3N2v, "classical" H3N2, and ph1n1 virus Metadata-driven Comparative Analysis Tool (Meta-CATS): compare amino acid sequences of (1) H3N2v virus and (2) classical H3N2 virus and identify positions that are significantly different between these two groups. Amino Acid Alignment: - align amino acid sequences of (1) H3N2v virus and (2) classical H3N2 virus - identify variant positions on alignment. 12

1. Search for M1 protein sequences from variant H3N2 virus, classical H3N2 virus, and H1N1 pandemic virus and save sequences from each search to a separate working set 1.1 Search for M1 protein sequences from variant H3N2 virus in swine during flu seasons 2008-2012 a. Go to the IRD website (http://www.fludb.org). From the grey navigation bar, mouse over Search Data, then Search Sequences and click Protein Sequences. b. On the Protein Sequence Search page, you will notice you have many options to search. Select the following parameters: Subtype: H3N2 Select Proteins: ý M1 Complete Sequences: ý Complete sequences only Host: ý Swine Geographic Grouping: ý North America Country: ý USA 2009 ph1n1 Sequences: ý Include only ph1n1 proteins Advanced Options (Click Advanced Options to view and select additional search options.) Remove Duplicate Sequences: ý Remove Duplicate Sequences - Only include one of the results if multiple proteins for the same sequence. Flu Season: ý 11-12; 10-11; 09-10; 08-09 Note: IRD has an automated ph1n1 sequence classification algorithm to classify sequences of human or swine viruses since 2009 as either 2009 H1N1 pandemic-like or not pandemic-like. Consequently, the Sequence Search page offers options of retrieving ph1n1-like sequences only or excluding ph1n1 sequences from search results. For this exercise, we are looking for M1 protein sequences of ph1n1-origin, so use the 2009 ph1n1 search option will quickly retrieve the sequences of our interest. SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA Home Protein Sequence Search Protein Sequence Search Search for influenza sequences, proteins, and strains using two types of searches. Use the advanced search to allow you to refine your search with the more fine grained search, and you can pick your viewing options. Results matching your criteria: 13 DATA TO RETURN SELECT PROTEINS HOST GEOGRAPHIC GROUPING Segment / Nucleotide Protein Strain VIRUS TYPE A B C SUB TYPE H3N2 * Use comma to separate multiple entries. Ex: H1N1, H7, H3N2. STRAIN NAME * Use comma to separate multiple entries. Ex: A/chicken/Israel/1055/2008, A/chicken/Laos/16/2008. All 1 PB2 2 PB1 2 PB1-F2 3 PA 3 PA-X 4 HA 5 NP 6 NA 7 M1 7 M2 8 NS1 COMPLETE SEQUENCES Complete Sequences only Include near-complete sequences (IVR) 2009 ph1n1 SEQUENCES (SOP) Include ph1n1 proteins Include only ph1n1 proteins Environment Ferret Horse Human Lab Large Cat Mule Muskrat Other Pika Plateau Pika Raccoon Dog Reassortant Sea Mammal Swine Unknown All Africa Asia Europe North America Oceania COUNTRY Montserrat Nicaragua Panama Puerto Rico Trinidad and Tobago USA USA STATE Alabama Alaska Arizona Arkansas California Colorado Connecticut Exclude all ph1n1 proteins DATE RANGE From: YYYY To: YYYY To add month to search, see Advance Options: Month Range ADVANCED OPTIONS Show All Select Advanced Option Select An Advanced Option FLU SEASON (SOP) 12-13 11-12 10-11 Tip: To select multiple or deselect, Ctrl-click (Windows) or Cmd-click (MacOS) Remove Select Advanced Option Remove Duplicate Sequences REMOVE DUPLICATE SEQUENCES Remove Duplicate Sequences - Only include one of the results if multiple proteins for the same sequence. Remove 13

c. Note that IRD shows an instant count of search results here to help you search quickly and efficiently. When you select search criteria on search pages, you will instantly know how many records match your search criteria without actually running the search. If there are too many or not enough search results, you can quickly adjust the search criteria on the search page to better fit your needs. After you have selected your criteria, click Search to run the query. d. The search result will be displayed in a table as shown below. Each column is sortable by clicking the header. Now click the Flu Season header to sort records by flu season. If you need advanced sorting options and want to display additional fields, click the Display Settings button. Note: you can click the icon next to any sequence of interest to view its details within a new page. SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA Home Protein Sequence Search Results Protein Sequence Search Results Your Selected Items: 0 items selected Add to Working Set Save Search Run Analysis Download Your search returned 13 proteins. Search Criteria Displaying 50 records per page, sorted by Flu Display Settings Season ascendingly. Select all 13 proteins More columns were returned than can be displayed without scrolling. Use scroll bars at top and bottom of display to move right and left or reduce the number of columns displayed by using the Display Settings link above. Name Sequence Accession Complete Genome Segment Segment Length Subtype * Date Host Species Country State/Province Flu Season (SOP) Strain Name M1 CY086923 Yes 7 987 H3N2 11/24/2009 Swine USA Minnesota 09-10 *A/swine/Minnesota/239105/2009(H3N2 M1 JF812322 Yes 7 979 H3N2 11/16/2010 *Swine USA Iowa 10-11 A/swine/Iowa/A01049034/2010 M1 JN409418 Yes 7 982 H3N2 02/16/2011 Swine USA Kansas 10-11 A/swine/Kansas/11-104467/2011 M1 JQ689095 Yes 7 982 H3N2 01/13/2011 Swine USA Minnesota 10-11 A/swine/Minnesota/A01047396/2011 M1 JN652463 Yes 7 982 H3N2 02/14/2011 *Swine USA Texas 10-11 A/swine/Texas/A01049555/2011 e. Note: from the Search Results page you can: i. Save the search query to your Workbench and rerun the search again later. ii. Select records and run an analysis on the selected records by mousing-over the Run Analysis button and clicking a desired analysis option. iii. Download the sequences (gene, CDS, protein) or the displayed table by clicking Download. iv. Store selected sequences as a working set in the Workbench so that you can run various analyses on the working set. v. View additional details for any item in the results table by clicking on the icon next to any row. f. Save H3N2v M1 protein sequences to a working set i. Select all records by clicking the checkbox above the results table. Then click the Add to Working Set button. Note: A working set is a container where you can save your regularly used sequences, strains or other types of data. With your interested data saved in a working set, you can perform various analyses on the dataset without having to search and find your interested data every time you do an analysis. To use this feature, you will need to register for an IRD Workbench account so that you can save your data to your online Workbench space. You will also be able to save and share data, analysis results, and searches online via the Workbench. 14

ii. You will be prompted to log in to your Workbench if you haven t done so already. If you don t have an IRD Workbench account yet, register for an account. iii. A lightbox displaying the Add to Working Set option will pop up. Name the working set to be H3N2-M1_swineUSA+ plike_2008-2012 and click Add to Working Set to save the sequences to a working set. 1.2 Search for M1 protein sequences from classical H3N2 virus in swine during flu seasons 2008-2012 a. On the Protein Sequence Search Results page, click the Search Criteria button to revise search. b. For this search, we will search for classical H3N2 M1 protein sequences that don t have ph1n1-origin M1, so change 2009 ph1n1 Sequences selection to: ý Exclude all ph1n1 proteins, keep all other selected criteria, and click Search to run the query. c. The Search Results page will be displayed. Repeat steps in 1.1.e to view sequence details. d. Return to the Search Results page by clicking the Results breadcrumb at the top of the page. e. Select all records by clicking the checkbox above the results table. Then click the Add to Working Set button. f. In the lightbox of Add to Working Set, select Create a new working set with the selected items, name the working set to be H3N2-M1_swineUSA+pUnlike_2008-2012 and click Add to Working Set to save the sequences to a working set. 1.3 Search for M1 protein sequence from a known H1N1 pandemic strain a. Now we will search for M1 protein sequence from the pandemic strain A/California/04/2009 and include it in our downstream analysis. Return to the Sequence Search page by clicking the corresponding breadcrumb. Or, in the grey navigation bar, mouse over the Search Data tab, then Search Sequences and click Protein Sequences. b. On the Protein Sequence Search page, clear all selected criteria (if applicable) by clicking the Clear button at the bottom of the page. Select the following criteria and then click Search. Strain Name: A/California/04/2009 Select Proteins: ý M1 c. The Search Results page will be displayed. Select FJ969513, which is from human host and 982 in segment length, by ticking the checkbox next to it. Then click the Add to Working Set button. d. In the lightbox of Add to Working Set, select Create a new working set with the selected items, name the working set to be A/California/04/2009-M1 and click Add to Working Set to save the selected sequence to a working set. e. Access your Workbench by clicking the Workbench tab in the grey navigation bar. You will see the newly created working sets at the top of the content list. Click to display items in the working set. 15

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA Home My Workbench My Workbench View workbench tutorial: Video Tutorial Static Tutorial Search Workbench Content Working Sets Searches Analysis Results Uploaded Files Access Private Shared by me Shared with me Public Special Unsaved Starred Trash Folders Home Folder Folders Shared With Me Your Selected Items: 3 items selected Deselect All Sharing Folders Move To Trash More Actions Displaying 20 records per page Display Settings Select all 51 items 1 2 3 Next > Page: 1 of 3 Content Name Data Type Items Folder Access Date Modified Type H3N2-M1_swineUSA+pUnlike_2008-2012 Working Protein 23 -N/A- Private 4/19/2013 Set 8:44 PM EDT A/California/04/2009-M1 Working Protein 1 -N/A- Private 4/19/2013 Set 8:27 PM EDT H3N2-M1_swineUSA+pLike_2008-2012 Working Protein 13 -N/A- Private 4/19/2013 Set 7:45 PM EDT Protein Sequence Search 1 Search Protein - -N/A- Unsaved 4/19/2013 N/A- 3:48 PM EDT Protein Sequence Search Search Protein - -N/A- Unsaved 4/19/2013 N/A- 2:21 PM 2. Phylogenetic analysis on M segment nucleotide sequences from variant H3N2, "classical" H3N2, and H1N1 pandemic viruses 2.1 Combine protein sequence working sets of variant H3N2, "classical" H3N2, and H1N1 pandemic viruses a. From the Workbench content list, select the working sets created in Part 1 by checking the checkboxes next to them. b. Click More Actions above the table. A lightbox will pop up. Click Combine to combine the selected working sets into a new working set. Name the new working set to be H3N2- M1_swineUSA _2008-2012_pLike_and_pUnlike_and_A/CA/04/09 and click Combine. c. The combined working set will appear at the top of the Workbench table. 2.2 Convert the combined protein sequence working set into a segment working set a. Select the combined protein sequence working set by ticking its checkbox. b. Click More Actions above the table. A lightbox will pop up. Click Convert to convert the protein sequence working set into a segment working set. Name the new working set to be H3N2 -Mseg_swineUSA_2008-2012_pLike_and_pUnlike_and_A/CA/04/09 and click Convert. 2.3 Construct a phylogenetic tree using the segment working set a. On the workbench content list, click next to H3N2-Mseg_swineUSA_2008-2012_pLike_and _punlike_and_a/ca/04/09 to display sequences in the working set. b. On the working set page, select all records by checking the checkbox above the table, mouse-over the Run Analysis button and click Generate Phylogenetic Tree. c. On the Generate Phylogenetic Tree page, you have the options of choosing your desired tree model and tree tip labels. For this exercise, choose the following parameters and then click Build Tree. Tree Generation: ý Quick Tree Label Tree Tips (Ends) with Specify custom format of tip label: ý Strain Name ý Subtype 16

d. This tree analysis will finish in a few seconds. When you have a large amount of input, the analysis may take some time to finish. While the analysis is running, you can choose to save the analysis result to your Workbench upon completion by typing an analysis name in the Save Analysis to Workbench box and clicking the Save to Workbench button. Now you can move to other parts of the site. The analysis result can be retrieved from your Workbench later. Processing... Data is still processing. Results will be shown when ready. TICKET NUMBER If you do not want to wait for the results, use your ticket number ( TR_346796346085 ) to come back to the Retrieve Results by Ticket Number page at a later time and retrieve your results. SAVE ANALYSIS TO WORKBENCH Enter the name you want to use and click Save to Workbench if you want to save the analysis when the results are ready. Save to Workbench NOTIFICATION OF COMPLETION Enter your email and click Request Notification if you want to receive a notification when the results are ready. Request Notification e. After the analysis is finished, a View Phylogenetic Tree page will be loaded. Here you can save the phylogenetic file in Newick or PhyloXML format to your computer. Click View Tree to load the Archaeopteryx Tree Viewer window. View Phylogenetic Tree Save Analysis Newick File PhyloXml File Tree Parameters PhyML Log Download Tree Build Parameters Click the "View Tree" button below to launch the tree viewer software in a new window. If you prefer other viewing software, the tree data is available for download in Newick or PhyloXml format using the buttons above. ENHANCED TREE VIEWER The IRD team has enhanced the Archaeopteryx phylogenetic tree viewer through addition of a tree decorating capability, allowing you to color-code the "leaves" of a tree based on host species, year, country, and subtype. In the tree viewer, use the drop-down menu for basic decoration to select the feature for coloring. The image and legend can be exported using options in the File drop-down menu. A user's guide is available. How to create a publication quality tree image View Tree f. A Tree Viewer window will pop up. How many clades do the sequences fall into? g. Many tree customization options exist including: reroot the tree, collapse/expand/display subtree, swap descendants, decorate (color) the tree leaves by any associated metadata (e.g. host or year of isolation, etc.), resize the tree, zoom in/out, fit the tree to window, change the font size, etc. i. In this exercise, we will decorate the tree by flu season to determine whether co-circulation of variant and classical strains is observed. To do so, click the Advanced Decoration button in the Tree Decorations section. A dialog box will pop up. Choose decorate by Flu Season and then tick the checkbox for Manual Decoration and click Go. ii. A manual decoration dialog box will pop up. Here you can select a flu season, choose a desired color for that season, and click Apply to see the decorations on the tree. Click Show Legend. iii. Given the topology of (and colors superimposed on) the phylogenetic tree, we can conclude that the strains do not segregate based on the time of isolation since there are representatives of sequences isolated during multiple years in both clades. 17

Classical H3N2 H3N2v +ph1n1 h. Find the ph1n1 sequence in the tree by typing in H1N1 in the Search Tree For box. The A/California/04/2009 strain will be highlighted in green. What clade does this strain group with? Can you explain why? i. You can export the tree image by using options under the File menu. j. Save the tree analysis to your Workbench. To do so, return to the View Phylogenetic Tree page, click the Save Analysis button and enter a name. 3. Metadata-driven Comparative Analysis Tool for Sequences (Meta-CATS) Metadata-driven Comparative Analysis Tool for Sequences (Meta-CATS) A unique comparative genomics analysis tool in IRD to identify nucleotide /amino acid positions that significantly differ between two or more groups of virus sequences. Meta-CATS consists of three parts: a multiple sequence alignment (using MUSCLE), a chi-square goodness of fit test to identify positions (columns) of the multiple sequence alignment that significantly differ from the expected (random) distribution of residues between all metadata groups, and a Pearson's chi-square test to identify the specific pairs of metadata groups that contribute to the observed statistical difference. Use Meta-CATS to identify positions that are significantly different between the variant and classical H3N2 sequences. a. Mouse-over Analyze and Visualize in the top menu bar and click Metadata Sequence Analysis. 18

b. The Meta-CATS landing page will be loaded. Home Metadata-driven Comparative Analysis Tool for Sequences (meta-cats) Metadata-driven Comparative Analysis Tool (meta-cats) Tutorial The meta-cats tool provides the capability to perform customized comparative genomics analyses with minimal manual manipulation. You can perform a statistical analysis on sequences assigned to up to 5 different groups to determine which residues significantly correlate with one or more metadata fields. The meta-cats tool looks for positions that significantly differ between user-defined groups of sequences. However, biological biases due to covariation, codon biases, and differences in genotype, geography, time of isolation, or others may affect the robustness of the underlying statistical assumptions. Click here to view meta-cats tutorial. See the SOP for a detailed description of how meta-cats functions. Note: An asterisk (*) = required field SEQUENCE GROUPING Manual Grouping 2 Auto Grouping C VALUE THRESHOLD The C-value threshold is used as the maximum probablity level for the likelihood that the position is different among the groups simply by chance. Enter threshold value: 0.05 INPUT SEQUENCES * Sequences can also be selected from search results or a working set in your workbench. Upload a file containing my sequences Paste sequences Use working sets selected Choose Working Sets H3N2-M1_swineUSA+pLike_2008-2012,H3N2- M1_swineUSA+pUnlike_2008-2012 are FORMAT OF SEQUENCES PROVIDED * Unaligned FASTA Aligned FASTA Nexus Clustal i. You can choose Manual Grouping if you want to manually group your sequences. If you want to group sequences by host, country, year, viral species, virus type, host age, host gender, or cohort, you can easily do so by using the Auto Grouping option. In this exercise, we will group the sequences by the two major branches from the phylogenetic tree analysis: variant H3N2 and classical H3N2, so choose Use working sets and select working sets H3N2- M1_swineUSA+pLike_2008-2012 (variant H3N2) and H3N2-M1_swineUSA+pUnlike_ 2008-2012 (classical H3N2). ii. Choose Unaligned FASTA for Format of Sequences Provided. Keep C-value threshold as 0.05. Then click Continue. c. On the next page, you will see that your sequences are grouped by working set and automatically divided into two groups. Verify the sequence assignments. You can manually remove any sequence from the lists by selecting a sequence and then clicking Remove. Click Run when you are finished. Home Metadata-driven Comparative Analysis Tool for Sequences (meta-cats) Metadata-driven Comparative Analysis Tool (meta-cats) - Setup Subset Double click on the Genomes to select them, then add them to their groups by click on "Add". "Remove" will send the selected Genomes back to the main list. You can also drag the genomes to their groups. MAIN LIST OF SEQUENCES LIST OF SEQUENCES GROUP 1 Add Remove M1 -N/A- G9HX62 A/swine/Kansas/11-104467/2011 JN409418 M1 -N/A- F5CTR2 A/swine/Iowa/A01049034/2010 JF812322 M1 -N/A- G5DFM6 A/swine/Texas/A01049556/2011 JN652464 M1 -N/A- H9EAE7 A/swine/Minnesota/A01047396/2011 JQ689095 M1 -N/A- G9HX82 A/swine/Kansas/11-109700/2011 JN409434 M1 -N/A- F0U3C1 A/swine/Minnesota/239105/2009(H3N2) CY086923 M1 N/A I0AXD3 A/ i /Mi i/a01240017/2011 JQ756383 LIST OF SEQUENCES GROUP 2 Add Remove M1 -N/A- E9NWF1 A/swine/Illinois/53612-4/2009 HQ734197 M1 -N/A- E9NWD9 A/swine/Illinois/53612-1/2009 HQ734188 M1 -N/A- H6QMB4 A/swine/Indiana/A01076191/2010 CY107023 M1 -N/A- I0AXD5 A/swine/Minnesota/A01240088/2011 JQ756384 M1 -N/A- H6QMA7 A/swine/North Carolina/A01076178/2009 CY107018 M1 -N/A- E9NWE3 A/swine/Illinois/53612-2/2009 HQ734191 M1 N/A F6MIQ4 A/ i /I di /A01049124/2010 JF833360 13 sequences 23 sequences Clear Run d. While the analysis is running, you can choose to save the analysis result to your own Workbench upon completion by typing an analysis name in the Save Analysis to Workbench box and clicking the Save to Workbench button. Now you can move to other parts of the site. The analysis result can be retrieved from your Workbench later. 19

e. The Meta-CATS analysis result has two reports: a Chi-square Goodness of Fit test result table listing the positions that have a significant non-random distribution between your specified groups, and a Pearson's chi-square test result table listing the specific pairs of groups that contribute to the observed statistical difference. Since this analysis only deals with two groups of sequences, we will primarily focus on the first (chi-square Goodness of Fit) result table. SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA Home Metadata-driven Comparative Analysis Tool for Sequences (meta-cats) Results Metadata-driven Comparative Analysis Report (Ticket# MG_416412150600) Save Analysis Generate Phylogenetic Tree Visualize Aligned Sequences Show P-Values Bar Plot The Metadata-driven Comparative Analysis Tool (meta-cats) (SOP) consists of three parts: a multiple sequence alignment (using MUSCLE), a chi-square goodness of fit test to identify positions (columns) of the multiple sequence alignment that significantly differ from the expected (random) distribution of residues between all metadata groups, and a Pearson's chi-square test to identify the specific pairs of metadata groups that contribute to the observed statistical difference. When 3 or more groups are included in the analysis, the C-value from the Goodness of Fit test will identify columns having significant variation between all groups, while the Pearson's test will identify the specific pair(s) of groups that make the column significant (i.e. if those groups were not included in the analysis, the column would no longer be identified as significant). Chi-square Goodness of Fit Test Result There are 15 positions that have a significant non-random distribution between the specified groups. "*" in position column indicates fewer than 5 non-zero residues in any cell of the contingency table. Position Chi-square Value C-value Degree Freedom Residue Diversity Sequence Feature 15* 31.967 5.319E-7 3 group1(13 I) group2(1 I, 1 L, 20 V, 1 X) 30* 35.993 1.528E-8 2 group1(1 N, 12 S) group2(23 D) 95 31.792 1.716E-8 1 group1(13 R) group2(23 K) 101* 28.073 1.168E-7 1 group1(13 K) group2(1 K, 22 R) 116 31.792 1.716E-8 1 group1(13 S) group2(23 A) 121 31.792 1.716E-8 1 group1(13 T) group2(23 A) 139* 21.038 4.503E-6 1 group1(1 A, 12 T) group2(21 A, 2 T) 142 31.792 1.716E-8 1 group1(13 A) group2(23 V) 166 31.792 1.716E-8 1 group1(13 A) group2(23 V) f. Review the Chi-square test results to see the positions that differ significantly between the variant and classical H3N2 viruses in swine. i. How many positions are significantly different between the two groups? Write down the position numbers of all identified positions. ii. Sort the results by the Goodness of Fit C-value to push the most different positions to the top of the table. What is the position number with the most significant C-value? 181 31.792 1.716E-8 1 group1(13 L) 4. Amino acid sequence alignment a. Now we will align the amino acid sequences of variant and classical H3N2 viruses. First, combine the variant and classical H3N2 working sets into one: i. Go to your Workbench. Select the H3N2-M1_swineUSA+pLike_2008-2012 (variant H3N2) and H3N2-M1_swineUSA+pUnlike_2008-2012 (classical H3N2) working sets by ticking the corresponding checkboxes. ii. Click More Actions above the table. A lightbox will pop up. Click Combine to combine the selected working sets into a new working set. Name the new working set to be H3N2- M1_swineUSA _2008-2012_pLike_and_pUnlike and click Combine. iii. The combined working set will appear at the top of the Workbench table. b. Click next to the new working set to display sequences in the working set. 20

c. On the working set page, select all records by checking the checkbox above the table, mouse-over the Run Analysis button and click Align Sequences (MSA). d. A Select Sequence Type lightbox will pop up. Choose Amino Acid M1 to align M1 protein sequences. Click Continue. e. The Align Sequences (MSA) page will be loaded. Choose desired sequence display options and click Run. f. As soon as the alignment is finished, the visualized sequence alignment window will be loaded. Many customization options are available in the JalView visualized alignment window. i. Can you find the residues that distinguish the variant H3N2 viruses from the classical H3N2? Compare the variant positions with those identified by Meta-CATS. ii. iii. iv. Color alignment based on sequence identity cutoff. To do so, click the Colour menu and then Above Identity Threshold from the list. Using the sliding bar to adjust color display such that only residues with >80% sequence identity are colored. Scroll left and right to view the alignment. View the consensus sequence and bar graph of conservation score at the bottom of the alignment window. If you need any help with JalView, click the Help menu and then Documentation to access the Jalview Documentation site. 21