Phylogenetic Methods Multiple Sequence lignment Pairwise distance matrix lustering algorithms: NJ, UPM - guide trees Phylogenetic trees
Nucleotide vs. amino acid sequences for phylogenies ) Nucleotides: - Synonymous vs. nonsynonymous substitutions - ransitions vs. transversions - oding vs. non-coding sequences - an analyze pseudogenes ) mino acids: - Distances can be very large for nucleotides - 0 characters, greater phylogenetic signal oday: ) Rooting phylogenetic trees B) Number of phylogenetic trees ) ree building (character, distance) D) esting the robustness of the tree E) esting alternative tree topologies F) Influenza
Inferring evolutionary relationships requires rooting the tree o root a tree, imagine that the tree is made of string. rab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: B Root B D D Unrooted tree Rooted tree Root here are two major ways to root trees: By outgroup: pick outgroup that is not too tart, not too sweet outgroup By midpoint or distance: on longest path; need to be sure evolutionary rates are same for all taxa 0 B d (,D) = 0 + + = Midpoint = / = 9 D
he number of possible trees grows quickly # OUs 0 0 0 n Unrooted trees,0,0.9 x 0. x 0 0.0 x 0 (n - )! / n- (n-)! Rooted trees 0,9,. x 0. x 0. x 0 (n - )! / n- (n-)! here are ~0 9 protons in the universe omputational methods for finding optimal trees Exhaustive algorithms: Evaluates all possible trees, choosing the one with the best score. Heuristic algorithms: pproximate methods that attempt to find the optimal tree for the method of choice, but cannot guarantee to do so.
How do we build a phylogenetic tree? ) Distance-based methods: - ransform the aligned sequences into pairwise distances - Use the distance matrix during tree building (UPM, Neighbor joining, etc.) - Decisions: how to deal with gaps? correction for multiple substitutions? How do we build a phylogenetic tree? ) haracter-based methods: - Examine aligned sequences, pick informative sites - Build tree that requires smallest number of changes (Maximum parsimony) - Or that has highest likelihood of producing data based on a sequence evolution model (Maximum likelihood)
Maximum parsimony methodology I IS VIN O DO WIH MORE WH N BE DONE WIH FEWER OR Principle of parsimony OR smallest number of evolutionary changes he most-parsimonious tree is the one that requires the fewest number of evolutionary events (e.g., nucleotide or amino acid substitutions) to explain the sequences observed in the taxa. Maximum parsimony methodology Step : Identify informative sites Sites with at least two different characters at the site, each of which is represented in at least two of the sequences Site Seq. 9
Maximum parsimony methodology Step : Identify informative sites Sites with at least two different characters at the site, each of which is represented in at least two of the sequences 9 Site Seq. Sites where all trees require the same number of changes are not informative 9 Seq. Site ree I ree II ree III = changes
MP analyzes sites at which one substitution model requires fewer changes 9 Seq. Site ree I ree II ree III = changes MP analyzes sites at which one substitution model requires fewer changes 9 Seq. Site ree I ree II ree III = changes
9 MP analyzes sites at which one substitution model requires fewer changes 9 Seq. Site ree I ree II ree III = changes Maximum parsimony methodology Step : alculate minimum number of substitutions at each informative site Step : Sum number of changes at each informative site for each possible tree he tree(s) with the least number of total changes is/are the most parsimonious tree(s) ree I ree II ree III ree I 9 # s @ site
Maximum parsimony computations Up to ~0 OUs: can do exhaustive search - Start with taxa in a tree, add one taxon at a time - Look at all possible trees, select best tree 0-0 OUs: start being selective - Determine a reasonably good threshold tree length - Pursue only those trees shorter than a threshold >0 OUs: heuristic search - educated guesses -Draw initial tree with fast algorithm -Search for shorter trees by examining only trees with similar topology; pruning and regrafting Bootstrapping is used to evaluate the robustness of phylogenetic trees ) Start with original dataset and original tree ) Randomly re-sample with replacement to obtain alignment of equal size (pseudo-sample) ) Build tree with re-sampled data, repeat 00-000x ) Determine frequency with which each clade in original tree is observed in pseudo-trees 0
Bootstrapping a phylogenetic tree B D 9 0 % time the same nodes were recovered Resample with replacement B D 9 Build tree with pseudosample Bootstrapping a phylogenetic tree B D 9 0 % time the same nodes were recovered Resample with replacement B D 0 Build tree with pseudosample
How are bootstrapping values interpreted? Measures how strongly the phylogenetic signal is distributed through the multiple sequence alignment Values > 0% are considered to support clade designations (estimated p < 0.0) ssumes samples are reasonably representative of larger population Which of two good trees are better? outgroup How is this tree? outgroup? Different methods for distance, MP, and ML trees
Influenza virus ssrn genome, ~, bases enome in segments, 0- genes Influenza virus genes enome segment Segment size (bases) ene(s) ene function 0 90 PB PB PB-F P H NP N M M NS NS ranscriptase: cap binding ranscriptase: elongation; Induces apoptosis ranscriptase: protease activity Hemagglutinin: host cell recognition Nucleoprotein: RN binding; transcriptase complex; vrn transport Neuraminidase: release of virus Matrix protein: major component of virion Integral membrane protein - ion channel Non-structural: RN transport, splicing, translation. nti-interferon. Non-structural: nucleus and cytoplasm, vrn export (NEP)
Influenza nomenclature Subtype nomenclature based on H and N genes Hemagglutinins, 9 Neuraminidases Human: H:,, ; N:,; Birds: all combinations Influenza virus can change rapidly High mutation rate (antigenic drift) Reassortment (antigenic shift) wo different viruses infect same cell an produce hybrid viruses
Reassortment can produce pandemic influenza viruses 9 sian flu: HN, avian flu segments, human flu segments 9 Hong Kong flu: HN, avian flu segments, human flu segments Reassortment in pigs - susceptible to avian, human, and swine flus 9 influenza pandemic Highly virulent flu virus ( Spanish flu ) Estimated deaths: 0-00 million worldwide (of. billion) Many people died within a few days from acute pneumonia Many fatalities were young and healthy people Lowered average U.S. life expectancy by 0 years
Spread of the 9 flu in the U.S. 9 influenza questions Where did the 9 flu come from? Why was the 9 flu so pathogenic? Is it possible for a 9-like pandemic to happen again?
vian flu HN Has jumped to humans (> 0 people infected) Very little immunity in humans: mortality rate ~0% an have similar pathology to 9 virus How close is avian flu to being able to efficiently infect humans and spread from human to human?