Principles of phylogenetic analysis Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Distance based methods Compare C OTUs and characters X A + D = Pairwise: A and B; X characters 2X Simple approach, join most similar Cluster phylogeny! Evolutionary clock? Substitution rate More sophisticated, e.g. Neighbor-joining Build phylogeny on D min for total tree Starting point star-tree X 2X AB B Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Parsimony analysis Construct tree with fewest changes A C A C = 1 change C A C = 2 changes (parallel) Find the shortest way through data! Gap handling, recoding, stepmatrixes Simple = presence / absence Recoding Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Maximum likelihood analysis Describe model of evolution Substitution rates, base frequencies Create tree, map characters to tree Probability of tree (P t ) = sum of probabilities of characters across tree Determine probabilities of trees Compare probabilities ΔP t12 = P t1 P t2 significant? Given the evolutionary model Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
ML vs. Bayesian likelihood ML searches for best tree given the evolutionary model and observed data Kishino-Hasegawa test compares the probabilities of trees Bayesian analysis, MCMC simulation Create trees based on evolutionary model Prior probability Determine likelihood of data given model Optimal hypothesis = posterior probability max = Prior probability of tree x likelihood of data Determined for internal branches in treesample Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Nice to know stuff Long branch attraction ti Support assessment Decay/Bremer support (parsimony) Consensus, Bootstrap / jackknife Confidence intervals Rooting of tree Outgroup to polarise, define ancestral states Midpoint Unrooted Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Whole genome based phylogeny from a Fusarium point of view R K A H J
Fungal genomes Some are listed more than once! Less than 50 complete fungal genome sequences An overview of Genome Databases http://www3.oup.co.uk/nar/databases/c/ uk/nar/databases/c/ 32 publicly available fungal genome databases (31.12.2007)
Fusarium genomes Three Fusarium genomes sequenced F. graminearum(2003) The second plant pathogenic fungus publicly available Size: ~40 MB Chromosomes: 4 Genes: 13.332 F. verticillioides Size 41.8 MB Chromosomes: 12 Genes: 14.179 F. oxysporum Size: 61.4 MB Chromosomes:? Genes: 17.735
Fusarium genomes Two Fusarium genomes nominated candidates at http://www.broad.mit.edu F. proliferatum F. solani (Nectria haematococca) Expressed sequence tag library F. sporotrichioides tihi id 7517 ESTs
Whole genome based phylogeny y Most (all?) use protein sequences Too much information in DNA sequences Effectively impossible to establish phylogeny Strong selection criteria for proteins included in studies Must be represented in all isolates studied Many genes are not annotated BLASTP to find homologous sequences Excluding gene families with >1 representative
Whole genome based phylogeny y Best approach for reconstructing ti genome phylogenies? D Supertree methods vs. Concatenated methods
Supertree methods Supertrees are phylogenies assembled from smaller phylogenies that share some but not necessarily all taxa in common Supertrees can make novel statements about relationships of taxa that do not co-occur on any single input tree while still retaining hierarchical information from the input trees.
Supertree methods Conventional studies source data: measurable attribute of an organism basic unit: character can be viewed as a putative statement of relationship Supertrees source data: phylogenies hl basic unit: membership criterion / statement of relationship (branching topology) at best, can be viewed as a proxy for a shared derived character
Supertree p construction E F GH J KL Direct consensus-like techniques A B C K L C DE H I K AB C D E F GH I J K L optimization coding technique h i criterion it i Indirect
Supertree methods Direct Strict consensus supertrees MinCutSupertree (and variants) Semi-strict supertrees Indirect Most matrix representation (MR) supertrees Parsimony (MRP and variants) Compatibility (MRC) Minimum flip supertrees (MRF) Average consensus (MRD) Gene tree parsimony
Concatenated methods Constructs t multiple l concatenated t protein sequence alignments Maximum likelihood analysis on the concatenated protein sequence alignments from multiple protein families
Concatenated methods Multiple sequence alignments Each in principle coding for topology Concatenated sequence alignment Corresponding to one very long protein Phylogenetic analysis of concatenated sequence alignment
Whole genome based phylogeny y An example: A dataset of 345.829 genes from 42 fungal genomes F. graminearum and F. verticillioides included
A fungal phylogeny based on 42 complete genomes.. Supertree method ClustalW on the 5316 protein families Manual adjustments of alignments not possible Only l used conserved alignments blocks Average length of alignment 697 sites reduced 214 sites Permutation tail probability test Better than random (P>0.001) 511 alignments failed 4805 alignments used in phylogenetic analysis
A fungal phylogeny based on 42 complete genomes.. Supertree method MultiPhyl protein substitution models Reconstruct t t ML phylogeny for each gene family 100 bootstrap replicates on all 4805 alignments! Results summarised: 70% majority-rule rule concensus These results used as input in supertree analysis Supertree analysis using Matrix representation with parsimony (MRP)
MSSA supertree derived from 4,805 fungal gene families. Bootstrap scores for all nodes are displayed. Rhizopus oryzae has been selected as an outgroup. The Basidiomycota and Ascomycota phyla form distinct clades. Subphyla and class clades are highlighted.
A fungal phylogeny based on 42 complete genomes.. Concatenated t method All proteins compared in FASTP to find orthologs Form multi-gene clusters of orthologs Only clusters with exactly one member per species 227 protein families Filtered out genes with no syntenic evidence 153 gene families used for further studies Individual gene families aligned, manually adjusted and concatenated together 38 000 amino acid alignment! ML phylogeny
Maximum likelihood phylogeny reconstructed using a concatenated alignment of 153 universally distributed fungal genes. The concatenated alignment contains 42 taxa and exactly 38,000 amino acid positions.
Phylogeny y High degree of concordance between supertree method and concatenated method Fusarium forms a monophyletic group with Trichoderma reesei as closest sister group The inference agreed with previous single gene The inference agreed with previous single gene phylogeny studies
Sordariomycetes Genome vs. multiple gene phylogeny y James et al., 2006. Nature 443: 818-822 6 gene phylogeny of nearly 200 fungal species ()
Fungal phylogeny y High degree of overall congruence between the two phylogenetic methods a closer look at Sordariomycetes Supertree method 4805 protein families Bayesian analysis 6 genes
Phylogeny why, what & when? Arne Holst-Jensen, National Veterinary Institute, t Norway, arne.holst-jensen@vetinst.no
Phylogeny: y The evolutionary history and line of descent of a taxon Usually reconstructed based on available data (characters) Applicable also outside biology Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Why? Evolutionary relationships Taxa Character evolution Systems biology Classification of taxa Identity verification Identify diagnostic features Prediction of features Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
What? Characters Phenotypic Genotypic Entities, often termed OTUs Operational taxonomic units But principle widely applicable Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
What, continued Character types: Two state: presence / absence Multistate, e.g.: DNA sequences (A, C, G, T, gaps) Very long, long, medium, short, very short Ordered, e.g.: Very long, long, medium, Unordered, e.g.: DNA sequences Polymorphic or missing characters Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
What, continued 2 Principles of phylogenetic analysis Distance based methods, e.g. N-J Minimise distance across global tree Parsimony P i based methods Minimise number of steps globally Maximum likelihood methods Probability of tree with evolutionary model Cluster analysis phylogenetic analysis! Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
When? Etblih Establish or test t evolutionary tree Appropriate data available Revise classification Develop diagnostics Predict features of OTU(s) Play with real-life computer game! Rationalise resource usage Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Data retrieval Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008 http://fusarium.cbio.psu.edu/
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008 http://srs.ebi.ac.uk/
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008 RK_EF1a_SRS.fas
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
http://blast.ncbi.nlm.nih.gov/blast.cgi Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008 New BlastN search
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008
Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008