Section D. Identification of serotype-specific amino acid positions in DENV NS1. Objective

Section D. Identification of serotype-specific amino acid positions in DENV NS1 Objective Upon completion of this exercise, you will be able to use the Virus Pathogen Resource (ViPR; http://www.viprbrc.org/) to: Search for virus sequences and view sequence annotations in ViPR Search for immune epitopes Save selected sequences as a working set in your private Workbench space Use Meta-CATS to identify nucleotide or amino acid positions that significantly differ between groups of virus sequences Perform a multiple sequence alignment to observe sequence conservation and variations In this use case, we will identify serotype-specific amino acid positions in NS1, which could then be used to produce serotype-specific antibodies and reagents for use in a rapid clinical test. I. Search for sequences and save sequences into working sets a. -over the Virus Families tab and select Dengue to get to the Dengue virus homepage. b. In the Search Data section, click Genes & Proteins. c. The Gene/Protein Search page will be loaded. You will notice you have many options to search: virus serotype, gene symbol, gene product name, virus attributes, host attributes, clinical attributes, etc. Select the following search options, note how the Results matching your criteria counts change as you define new search criteria, and click the Search button. Dengue virus 1 Select All Dengue virus 2 Select All Dengue virus 3 Select All Dengue virus 4 Select All Complete Genomes Only checked Gene Symbol radio button NS1 Advanced options Remove Identical Protein Sequences checkbox Complete Genomes Only checked Results matching your criteria: 771 SELECT VIRUS(ES) TO INCLUDE IN SEARCH Jump to strain in taxonomy: COLLECTION YEAR GEOGRAPHIC GROUPING HOST SELECTION Start to type strain to get suggestions Deselect All (0/873 strains selected) (873 Strains - 56 complete genomes) Type: Dengue virus 1 Deselect All (4054/4054 strains selected) (4054 Strains - 1418 complete genomes) Type: Dengue virus 2 Deselect All (4202/4202 strains selected) (4202 Strains - 1026 complete genomes) Type: Dengue virus 3 Deselect All (3107/3107 strains selected) (3107 Strains - 717 complete genomes) Type: Dengue virus 4 Deselect All (1022/1022 strains selected) (1022 Strains - 143 complete genomes) Type: Dengue virus (Manipulated Select All COMPLETE GENOME Start: YYYY End: YYYY To add month to search, see Advance Search Options: Month Range Africa Asia Europe North America Oceania South America COUNTRY Afghanistan Albania Algeria American Samoa Andorra Angola Anguilla All Bat Macaque Monkey Mosquito Opossum Rat Unknown Complete Genome Only SEARCH TYPE (SOP) NS1 Include Polyproteins in Results Gene Symbol e.g. C, E Gene Product Name GENOMIC LOCATION Start: End: 38

d. The Search Results page will be displayed. Here you can: i. Sort records by a display field by clicking the corresponding column header. Note: You can do advanced sorting by clicking the Display Settings button located above the result table. ii. Save the search query to your Workbench and rerun the search again later. iii. Download sequences (CDS, protein) by selecting sequences and then clicking Download. iv. Store selected sequences into a working set by clicking Add to Working Set. v. Analyze sequences by selecting the desired sequences and choosing an analysis option under the Run Analysis menu. vi. Click View next to a strain name to view the Gene/Protein details. e. From the Search Results page, select all NS1 protein records by ticking the checkbox above the table. Then click the Add to working set button to add them to a working set. f. You ll be prompted to log in to your Workbench account in order to save data to a working set. If you don t have an account already, simply register for an account for free by choosing the Register for a new account option and following the prompts. g. A lightbox of Add to Working Set will pop up. Now create a new working set and name it Unique DENV NS1 proteins. Click Add to Working Set to save selected records to a working set. h. Click the Workbench tab in the grey navigation bar, you will see the working set Unique DENV NS1 proteins listed at the top of the table. i. Click View to display records saved in the working set. II. Metadata-driven Comparative Analysis Tool for Sequences (Meta-CATS) Metadata-driven Comparative Analysis Tool for Sequences (Meta-CATS) A unique comparative genomics analysis tool in ViPR to identify nucleotide/amino acid positions that significantly differ between two or more groups of virus sequences. Meta-CATS consists of three parts: a multiple sequence alignment (using MUSCLE), a chi-square goodness of fit test to identify positions (columns) of the multiple sequence alignment that significantly differ from the expected (random) distribution of residues between all metadata groups, and a Pearson's chi-square test to identify the specific pairs of metadata groups that contribute to the observed statistical difference. a. On the Working Set Details page, select all NS1 records by ticking the Select all checkbox, mouse over Run Analysis, and click Metadata-driven Comparative Analysis Tool. 39

b. The Meta-CATS page will be loaded. Choose the Auto Grouping radio button, choose Viral Species from the drop-down list, and then click Continue. c. On the Meta-CATS Setup Subset page, you will see the selected records have been separated into 4 groups based on serotype. Confirm grouping and click Run. d. Once the analysis is finished, the Meta-CATS Report will be displayed. The Meta-CATS analysis result has two reports: a Chi-square Goodness of Fit test result table listing the positions that have a significant non-random distribution between your specified groups, and a Pearson's chi-square test result table listing the specific pairs of groups that contribute to the observed statistical difference. 40

e. Examine position 47 and neighboring positions in both the top and bottom tables. What do you observe regarding the amino acids present in the different serotype groups? f. Examine positions 124 and 191 as well. III. Visualize protein sequence alignment Now we are going to view the protein sequence alignment to confirm the meta-cats results. a. From the meta-cats report page, click Visualize Aligned Sequences at the top of the page. Note: You can also run an alignment on the saved sequence working set by navigating to the working set in your Workbench area and then clicking the Visualize Aligned Sequences option from the Run Analysis pull down menu. b. The alignment is presented in the JalView visualization window. The window is interactive. i. The consensus sequence is shown at the bottom of the window. You can choose to show sequence logos by right-clicking on consensus and then selecting "Show logo". 41

ii. iii. You can manually adjust the alignment and display using various gray menu options. Scroll right to position 124. Note the residue at this position is correlated with serotypes. (DENV-1: I; DENV-2: L; DENV-3: V; DENV-4: F). IV. Immune epitope locations a. over the Search Data tab and click Immune Epitopes. b. The Immune Epitope Search page will be loaded. Select the following search criteria, and then click the Search button. Flavivirus -> Dengue virus Select All Gene Symbol NS1 Epitope Type Experimentally Determined Epitopes B cell Positive checkbox Host human c. The Immune Epitope Search page will be loaded. Find IEDB ID 2246. Serotype-specific position 124 identified by Meta-CATS is located within this B-cell epitope. Experimentally Determined Epitope Search Results Your Selected Items: 0 items selected Download Your search returned 32 epitopes. Search Criteria Displaying 50 records per page, sorted by IEDB ID in ascending Select all 32 epitopes order. Display Settings More columns were returned than can be displayed without scrolling. Use scroll bars at top and bottom of display to move right and left or reduce the number of columns displayed by using the Display Settings link above. IEDB ID Epitope Sequence Proteins Matching Protein Names Sequence 2246 AKMLSTELH 1 unknown (found 3) 10124 DSGCVVSWK 1 unknown (found 2) Host Assay Type Category Assay Result Negative Negative Positive Positive-High Positive-High MHC Allele Name MHC Allele Class 42