Principles of phylogenetic analysis

Similar documents
Phylogenetic Methods

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2012

The BLAST search on NCBI ( and GISAID

Using Phylogenetic Structure to Assess the Evolutionary Ecology of Microbiota! TJS! iseem Call! April 2015!

Estimating Phylogenies (Evolutionary Trees) I

Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise!

CONSTRUCTION OF PHYLOGENETIC TREE USING NEIGHBOR JOINING ALGORITHMS TO IDENTIFY THE HOST AND THE SPREADING OF SARS EPIDEMIC

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein

SUPPLEMENTAL INFORMATION

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

To test the possible source of the HBV infection outside the study family, we searched the Genbank

Going Nowhere Fast: Lentivirus genetic sequence evolution does not correlate with phenotypic evolution.

Distinguishing epidemiological dependent from treatment (resistance) dependent HIV mutations: Problem Statement

(ii) The effective population size may be lower than expected due to variability between individuals in infectiousness.

Global variation in copy number in the human genome

Multiple sequence alignment

Mapping the Antigenic and Genetic Evolution of Influenza Virus

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Supplementary Materials

Evolutionary interactions between haemagglutinin and neuraminidase in avian influenza

Rajesh Kannangai Phone: ; Fax: ; *Corresponding author

BEAST Bayesian Evolutionary Analysis Sampling Trees

Origins and evolutionary genomics of the novel avian-origin H7N9 influenza A virus in China: Early findings

Phylogenetic Tree Practical Problems

Lecture 12. Immunology and disease: parasite antigenic diversity. and. Phylogenetic trees

Structural Variation and Medical Genomics

Maximum Likelihood ofevolutionary Trees is Hard p.1

1 Supplementary Figures

Adaptation vs Exaptation. Examples of Exaptation. Behavior of the Day! Historical Hypotheses

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Teaching Phylogeny and Direction of Viral Transmission using a Real HIV Criminal Case

arxiv: v1 [cs.ce] 30 Dec 2014

Network-assisted data analysis

Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W

RNA Secondary Structures: A Case Study on Viruses Bioinformatics Senior Project John Acampado Under the guidance of Dr. Jason Wang

aP. Code assigned: Short title: Remove (abolish) the species Narcissus symptomless virus in the genus Carlavirus, family Betaflexiviridae

README file for GRASTv1.0.pl

Project PRACE 1IP, WP7.4

OVERVIEW OF CURRENT IDENTIFICATION SYSTEMS AND DATABASES

Evolution of influenza

SUPPLEMENTARY INFORMATION

a-hV. Code assigned:

Cahn - Ingold - Prelog system. Proteins: Evolution, and Analysis Lecture 7 9/15/2009. The Fischer Convention (1) G (2) (3)

Research Strategy: 1. Background and Significance

Department of Forest Ecosystems and Society, Oregon State University

Classification Student Material

A Universal Trend among Proteomes Indicates an Oily Last Common Ancestor. BI Journal Club Aleksander Sudakov

Identification of mirnas in Eucalyptus globulus Plant by Computational Methods

Host Dependent Evolutionary Patterns and the Origin of 2009 H1N1 Pandemic Influenza

SMPD 287 Spring 2015 Bioinformatics in Medical Product Development. Final Examination

Utilization of NCBI Pathogen Detection Tool in USDA FSIS

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

YUMI YAMAGUCHI-KABATA AND TAKASHI GOJOBORI* Center for Information Biology, National Institute of Genetics, Mishima , Japan

Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

Study the Evolution of the Avian Influenza Virus

Nature Methods: doi: /nmeth.3115

Phylogenomics. Antonis Rokas Department of Biological Sciences Vanderbilt University.

I. Setup. - Note that: autohgpec_v1.0 can work on Windows, Ubuntu and Mac OS.

Genomic structural variation

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models

Principles and Practice of Phylogenetic Systematics. Biol Rich Strauss

Benchmark datasets for phylogenomic pipeline validation

Towards an open-source, unified platform for disease outbreak analysis using

Intraseasonal Dynamics and Dominant Sequences in H3N2 Influenza

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

It is well known that some pathogenic microbes undergo

FINAL ANNOTATION REPORT: Drosophila virilis Fosmid 11 (48P14) Robert Carrasquillo Bio 4342

COMPARATIVE ANALYSIS OF BIOINFORMATICS TOOLS USED IN HIV-1 STUDIES

Application of phylogeny reconstruction and character-evolution analysis to inferring patterns of directional microbial transmission

Supplementary Online Content

PROTOCOL FOR INFLUENZA A VIRUS GLOBAL SWINE H1 CLADE CLASSIFICATION

Dan Koller, Ph.D. Medical and Molecular Genetics

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Source and target enzyme signature in serine protease inhibitor active site sequences

Review of The ancestral flower of angiosperms and its early diversification by H. Sauquet et al.

Inter-country mixing in HIV transmission clusters: A pan-european phylodynamic study

Gene Finding in Eukaryotes

Mapping Evolutionary Pathways of HIV-1 Drug Resistance. Christopher Lee, UCLA Dept. of Chemistry & Biochemistry

Understanding the Origins of a Pandemic Virus. Department of Biomedical Informatics, Columbia University College of Physicians and Surgeons,

UvA-DARE (Digital Academic Repository)

Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend

Journal: Nature Methods

Drug Metabolism Disposition

AutoOrthoGen. Multiple Genome Alignment and Comparison

What can pathogen phylogenetics tell us about cross-species transmission?

Chapter 1. Introduction

WGS Works! Shared Mission Different Roles APPLICATIONS SEQUENCING (WGS) Non-regulatory. Regulatory CDC. FDA and USDA. Peter Gerner-Smidt, MD ScD

Molecular Epidemiology of HBV Genotypes Circulating In Acute Hepatitis B Patients In The Campania Region

Among all organisms, humans are : Archaea... Bacteria... Eukaryotes... Viruses... Among eukaryotes, humans are : Protists... Plants... Animals...

MutationTaster & RegulationSpotter

Genetics and Genomics in Medicine Chapter 8 Questions

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009

A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data

Host-Specific Modulation of the Selective Constraints Driving Human Immunodeficiency Virus Type 1 env Gene Evolution

aM (modules 1 and 10 are required)

Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses. Objective

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017

SEQUENCE FEATURE VARIANT TYPES

Learning Convolutional Neural Networks for Graphs

Protein Reports CPTAC Common Data Analysis Pipeline (CDAP)

Transcription:

Principles of phylogenetic analysis Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Distance based methods Compare C OTUs and characters X A + D = Pairwise: A and B; X characters 2X Simple approach, join most similar Cluster phylogeny! Evolutionary clock? Substitution rate More sophisticated, e.g. Neighbor-joining Build phylogeny on D min for total tree Starting point star-tree X 2X AB B Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Parsimony analysis Construct tree with fewest changes A C A C = 1 change C A C = 2 changes (parallel) Find the shortest way through data! Gap handling, recoding, stepmatrixes Simple = presence / absence Recoding Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Maximum likelihood analysis Describe model of evolution Substitution rates, base frequencies Create tree, map characters to tree Probability of tree (P t ) = sum of probabilities of characters across tree Determine probabilities of trees Compare probabilities ΔP t12 = P t1 P t2 significant? Given the evolutionary model Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

ML vs. Bayesian likelihood ML searches for best tree given the evolutionary model and observed data Kishino-Hasegawa test compares the probabilities of trees Bayesian analysis, MCMC simulation Create trees based on evolutionary model Prior probability Determine likelihood of data given model Optimal hypothesis = posterior probability max = Prior probability of tree x likelihood of data Determined for internal branches in treesample Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Nice to know stuff Long branch attraction ti Support assessment Decay/Bremer support (parsimony) Consensus, Bootstrap / jackknife Confidence intervals Rooting of tree Outgroup to polarise, define ancestral states Midpoint Unrooted Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Whole genome based phylogeny from a Fusarium point of view R K A H J

Fungal genomes Some are listed more than once! Less than 50 complete fungal genome sequences An overview of Genome Databases http://www3.oup.co.uk/nar/databases/c/ uk/nar/databases/c/ 32 publicly available fungal genome databases (31.12.2007)

Fusarium genomes Three Fusarium genomes sequenced F. graminearum(2003) The second plant pathogenic fungus publicly available Size: ~40 MB Chromosomes: 4 Genes: 13.332 F. verticillioides Size 41.8 MB Chromosomes: 12 Genes: 14.179 F. oxysporum Size: 61.4 MB Chromosomes:? Genes: 17.735

Fusarium genomes Two Fusarium genomes nominated candidates at http://www.broad.mit.edu F. proliferatum F. solani (Nectria haematococca) Expressed sequence tag library F. sporotrichioides tihi id 7517 ESTs

Whole genome based phylogeny y Most (all?) use protein sequences Too much information in DNA sequences Effectively impossible to establish phylogeny Strong selection criteria for proteins included in studies Must be represented in all isolates studied Many genes are not annotated BLASTP to find homologous sequences Excluding gene families with >1 representative

Whole genome based phylogeny y Best approach for reconstructing ti genome phylogenies? D Supertree methods vs. Concatenated methods

Supertree methods Supertrees are phylogenies assembled from smaller phylogenies that share some but not necessarily all taxa in common Supertrees can make novel statements about relationships of taxa that do not co-occur on any single input tree while still retaining hierarchical information from the input trees.

Supertree methods Conventional studies source data: measurable attribute of an organism basic unit: character can be viewed as a putative statement of relationship Supertrees source data: phylogenies hl basic unit: membership criterion / statement of relationship (branching topology) at best, can be viewed as a proxy for a shared derived character

Supertree p construction E F GH J KL Direct consensus-like techniques A B C K L C DE H I K AB C D E F GH I J K L optimization coding technique h i criterion it i Indirect

Supertree methods Direct Strict consensus supertrees MinCutSupertree (and variants) Semi-strict supertrees Indirect Most matrix representation (MR) supertrees Parsimony (MRP and variants) Compatibility (MRC) Minimum flip supertrees (MRF) Average consensus (MRD) Gene tree parsimony

Concatenated methods Constructs t multiple l concatenated t protein sequence alignments Maximum likelihood analysis on the concatenated protein sequence alignments from multiple protein families

Concatenated methods Multiple sequence alignments Each in principle coding for topology Concatenated sequence alignment Corresponding to one very long protein Phylogenetic analysis of concatenated sequence alignment

Whole genome based phylogeny y An example: A dataset of 345.829 genes from 42 fungal genomes F. graminearum and F. verticillioides included

A fungal phylogeny based on 42 complete genomes.. Supertree method ClustalW on the 5316 protein families Manual adjustments of alignments not possible Only l used conserved alignments blocks Average length of alignment 697 sites reduced 214 sites Permutation tail probability test Better than random (P>0.001) 511 alignments failed 4805 alignments used in phylogenetic analysis

A fungal phylogeny based on 42 complete genomes.. Supertree method MultiPhyl protein substitution models Reconstruct t t ML phylogeny for each gene family 100 bootstrap replicates on all 4805 alignments! Results summarised: 70% majority-rule rule concensus These results used as input in supertree analysis Supertree analysis using Matrix representation with parsimony (MRP)

MSSA supertree derived from 4,805 fungal gene families. Bootstrap scores for all nodes are displayed. Rhizopus oryzae has been selected as an outgroup. The Basidiomycota and Ascomycota phyla form distinct clades. Subphyla and class clades are highlighted.

A fungal phylogeny based on 42 complete genomes.. Concatenated t method All proteins compared in FASTP to find orthologs Form multi-gene clusters of orthologs Only clusters with exactly one member per species 227 protein families Filtered out genes with no syntenic evidence 153 gene families used for further studies Individual gene families aligned, manually adjusted and concatenated together 38 000 amino acid alignment! ML phylogeny

Maximum likelihood phylogeny reconstructed using a concatenated alignment of 153 universally distributed fungal genes. The concatenated alignment contains 42 taxa and exactly 38,000 amino acid positions.

Phylogeny y High degree of concordance between supertree method and concatenated method Fusarium forms a monophyletic group with Trichoderma reesei as closest sister group The inference agreed with previous single gene The inference agreed with previous single gene phylogeny studies

Sordariomycetes Genome vs. multiple gene phylogeny y James et al., 2006. Nature 443: 818-822 6 gene phylogeny of nearly 200 fungal species ()

Fungal phylogeny y High degree of overall congruence between the two phylogenetic methods a closer look at Sordariomycetes Supertree method 4805 protein families Bayesian analysis 6 genes

Phylogeny why, what & when? Arne Holst-Jensen, National Veterinary Institute, t Norway, arne.holst-jensen@vetinst.no

Phylogeny: y The evolutionary history and line of descent of a taxon Usually reconstructed based on available data (characters) Applicable also outside biology Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Why? Evolutionary relationships Taxa Character evolution Systems biology Classification of taxa Identity verification Identify diagnostic features Prediction of features Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

What? Characters Phenotypic Genotypic Entities, often termed OTUs Operational taxonomic units But principle widely applicable Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

What, continued Character types: Two state: presence / absence Multistate, e.g.: DNA sequences (A, C, G, T, gaps) Very long, long, medium, short, very short Ordered, e.g.: Very long, long, medium, Unordered, e.g.: DNA sequences Polymorphic or missing characters Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

What, continued 2 Principles of phylogenetic analysis Distance based methods, e.g. N-J Minimise distance across global tree Parsimony P i based methods Minimise number of steps globally Maximum likelihood methods Probability of tree with evolutionary model Cluster analysis phylogenetic analysis! Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

When? Etblih Establish or test t evolutionary tree Appropriate data available Revise classification Develop diagnostics Predict features of OTU(s) Play with real-life computer game! Rationalise resource usage Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Data retrieval Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008 http://fusarium.cbio.psu.edu/

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008 http://srs.ebi.ac.uk/

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008 RK_EF1a_SRS.fas

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

http://blast.ncbi.nlm.nih.gov/blast.cgi Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008 New BlastN search

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008

Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008