Australian Systematic Botany Australian Systematic Botany Society
Taxonomy, biogeography and evolution of plants
L. A. S. JOHNSON REVIEW

Construction and annotation of large phylogenetic trees

Michael J. Sanderson
+ Author Affiliations
- Author Affiliations

Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA.

Australian Systematic Botany 20(4) 287-301 https://doi.org/10.1071/SB07006
Submitted: 28 February 2007  Accepted: 22 May 2007   Published: 5 September 2007

Abstract

Broad availability of molecular sequence data allows construction of phylogenetic trees with 1000s or even 10 000s of taxa. This paper reviews methodological, technological and empirical issues raised in phylogenetic inference at this scale. Numerous algorithmic and computational challenges have been identified surrounding the core problem of reconstructing large trees accurately from sequence data, but many other obstacles, both upstream and downstream of this step, are less well understood. Before phylogenetic analysis, data must be generated de novo or extracted from existing databases, compiled into blocks of homologous data with controlled properties, aligned, examined for the presence of gene duplications or other kinds of complicating factors, and finally, combined with other evidence via supermatrix or supertree approaches. After phylogenetic analysis, confidence assessments are usually reported, along with other kinds of annotations, such as clade names, or annotations requiring additional inference procedures, such as trait evolution or divergence time estimates. Prospects for partial automation of large-tree construction are also discussed, as well as risks associated with ‘outsourcing’ phylogenetic inference beyond the systematics community.


References


Aho AV, Sagiv Y, Szymanski TG, Ullman JD (1981) Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM Journal of Computing 10, 405–421.
CrossRef | open url image1

Alfaro ME, Holder MT (2006) The posterior and the prior in Bayesian phylogenetics. Annual Review of Ecology Evolution and Systematics 37, 19–42.
CrossRef | open url image1

Ammiraju JSS, Luo MZ, Goicoechea JL, Wang W, Kudrna D, Mueller C, Talag J, Kim HR, Sisneros NB, Blackmon B, Fang E, Tomkins JB, Brar D, MacKill D, McCouch S, Kurata N, Lambert G, Galbraith DW, Arumuganathan K, Rao K, Walling JG, Gill N, Yu1 Y, SanMiguel P, Soderlund C, Jackson S, Wing RA (2006) The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Research 16, 140–147.
CrossRef | PubMed | open url image1

Ané C , Eulenstein O , Piaggio-Talice R , Sanderson MJ (2006) Groves of phylogenetic trees. Technical Report. University of Wisconsin, Madison, WI.

Angiosperm Phylogeny Group (2003) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Botanical Journal of the Linnean Society 141, 399–436.
CrossRef | open url image1

Arvestad L, Berglund A-C, Lagergren J, Sennblad B (2003) Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics suppl. 1 19, i7–i15.
CrossRef | PubMed | open url image1

Bansal M, Burleigh JG, Eulenstein O, Wehe A (2007) Heuristics for the gene-duplication problem: an W(N) speed-up for the local search. RECOMB 2007 , open url image1

Bapteste E, Brinkmann H, Lee JA, Moore DV, Sensen CW, Gordon P, Duruflé L, Gaasterland T, Lopez P, Müller M, Hervé P (2002) The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proceedings of the National Academy of Sciences of the United States of America 99, 1414–1419.
CrossRef | PubMed | open url image1

Baum BR (1992) Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41, 3–10.
CrossRef | open url image1

Bender MA, Farach-Colton M (2000) The LCA problem revisited. Lecture Notes in Computer Science 1776, 88–94. open url image1

Bininda-Emonds ORP (2004a) The evolution of supertrees. Trends in Ecology & Evolution 19, 315–322.
CrossRef | open url image1

Bininda-Emonds ORP (Ed.) (2004 b) ‘Phylogenetic supertrees.’ (Kluwer: Boston)

Bininda-Emonds ORP, Brady SG, Kim J, Sanderson MJ (2001) Scaling of accuracy in extremely large phylogenetic trees. Pacific Symposium on Biocomputing 6, 547–558. open url image1

Bininda-Emonds ORP, Gittleman JL, Steel MA (2002) The (super)tree of life: procedures, problems, and prospects. Annual Review of Ecology and Systematics 33, 265–290.
CrossRef |
open url image1

Britton T (2005) Estimating divergence times in phylogenetic trees without a molecular clock. Systematic Biology 54, 500–507.
CrossRef | PubMed | open url image1

Britton T, Oxelman B, Vinnersten A, Bremer K (2002) Phylogenetic dating with confidence intervals using mean path lengths. Molecular Phylogenetics and Evolution 24, 58–65.
CrossRef | PubMed | open url image1

Burleigh JG, Driskell AC, Sanderson MJ (2006) Supertree bootstrapping methods for assessing phylogenetic variation among genes in genome-scale data sets. Systematic Biology 55, 426–440.
CrossRef | PubMed | open url image1

Burnham KP , Anderson DR (1998) ‘Model selection and inference.’ (Springer: New York)

Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17, 540–552.
PubMed |
open url image1

Chang JT (1996) Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. Mathematical Biosciences 134, 189–215.
CrossRef | PubMed | open url image1

Charalambous M, Trancoso P, Stamatakis A (2005) Initial experiences porting a bioinformatics application to a graphics processor. Lecture Notes in Computer Science 3746, 415–425. open url image1

Chase MW, Soltis DE, Olmstead RG, Morgan D, Les DH, Mishler BD, Duvall MR, Price RA, Hills HG, Qiu Y-L, Kron KA, Rettig JH, Conti E, Palmer JD, Manhart JR, Sytsma KJ, Michaels HJ, Kress WJ, Karol KG, Clark WD, Hedrén M, Gaut BS, Jansen RK, Kim K-J, Wimpee CF, Smith JF, Furnier GR, Strauss SH, Xiang Q-Y, Plunkett GM, Soltis PS, Swensen SM, Williams SE, Gadek PA, Quinn CJ, Eguiarte LE, Golenberg E, Learn GH, Graham SW, Barrett SCH, Dayanandan S, Albert VA (1993) Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden 80, 528–580.
CrossRef | open url image1

Chave J, Muller-Landau HC, Baker TR, Easdale TA, Ter Steege H, Webb CO (2006) Regional and phylogenetic variation of wood density across 2456 neotropical tree species. Ecological Applications 16, 2356–2367.
CrossRef | PubMed | open url image1

Chevenet F, Brun C, Banuls AL, Jacq B, Christen R (2006) TreeDyn: towards dynamic graphics and annotations for analyses of trees. BMC Bioinformatics 7,
CrossRef | PubMed | open url image1

Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P (2006) Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287.
CrossRef | PubMed | open url image1

Cunningham CW (1997) Can three incongruence tests predict when data should be combined? Molecular Biology and Evolution 14, 733–740.
PubMed |
open url image1

Davies TJ, Barraclough TG, Chase MW, Soltis PS, Soltis DE, Savolainen V (2004) Darwin’s abominable mystery: insights from a supertree of the angiosperms. Proceedings of the National Academy of Sciences, USA 101, 1904–1909.
CrossRef | open url image1

Dong QF, Kroiss L, Oakley FD, Wang BB, Brendel V (2005) Comparative EST analyses in plant systems. Methods in Enzymology 395, 400–419.
CrossRef | PubMed | open url image1

Driskell AC, Ané C, Burleigh JG, McMahon MM, O’Meara B, Sanderson MJ (2004) Prospects for building the tree of life from large sequence databases. Science 306, 1172–1174.
CrossRef | PubMed | open url image1

Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biology 4, e88.
CrossRef | PubMed | open url image1

Du ZH, Lin F (2006) pNJTree: a parallel program for reconstruction of neighbor-joining tree and its application in ClustalW. Parallel Computing 32, 441–446.
CrossRef | open url image1

Du ZH, Lin F, Roshan UW (2005) Reconstruction of large phylogenetic trees: a parallel approach. Computational Biology and Chemistry 29, 273–280.
CrossRef | PubMed | open url image1

Durand D, Halldorsson BV, Vernot B (2005) A hybrid micro-macroevolutionary approach to gene tree reconstruction. Lecture Notes in Computer Science 3500, 250–264. open url image1

Farris JS, Kallersjo M, Kluge AG, Bult C (1994) Testing significance of incongruence. Cladistics 10, 315–319.
CrossRef | open url image1

Farris JS, Kallersjo M, Kluge AG, Bult C (1995) Constructing a significance test for incongruence. Systematic Biology 44, 570–572.
CrossRef | open url image1

Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27, 401–410.
CrossRef | open url image1

Godfray HCJ, Knapp S (2004) Taxonomy for the twenty-first century—Introduction. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 359, 559–569.
CrossRef | PubMed | open url image1

Goloboff PA (1999) Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15, 415–428.
CrossRef | open url image1

Goodman M, Czelusniak J, Moore GW, Romeroherrera AE, Matsuda G (1979) Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28, 132–163.
CrossRef | open url image1

Goremykin VV, Hansmann S, Martin WF (1997) Evolutionary analysis of 58 proteins encoded in six completely sequenced chloroplast genomes: revised molecular estimates of two seed plant divergence times. Plant Systematics and Evolution 206, 337–351.
CrossRef | open url image1

Goremykin V, Hirsch-Ernst K, Wolfl S, Hellwig F (2003) Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm. Molecular Biology and Evolution 20, 1499–1505.
CrossRef | PubMed | open url image1

Graybeal A (1998) Is it better to add taxa or characters to a difficult phylogenetic problems? Systematic Biology 47, 9–17.
CrossRef | PubMed | open url image1

Grotkopp E, Rejmanek M, Sanderson MJ, Rost TL (2004) Evolution of genome size in pines (Pinus) and its life-history correlates: supertree analyses. Evolution 58, 1705–1729.
PubMed |
open url image1

Hardy CR, Linder HP (2005) Intraspecific variability and timing in ancestral ecology reconstruction: a test case from the Cape flora. Systematic Biology 54, 299–316.
CrossRef | PubMed | open url image1

Hibbett D, Nilsson R, Snyder M, Fonseca M, Costanzo J, Shonfeld M (2005) Automated phylogenetic taxonomy: an example in the homobasidiomycetes (mushroom-forming fungi). Systematic Biology 54, 660–668.
CrossRef | PubMed | open url image1

Hillis DM (1996) Inferring complex phylogenies. Nature 383, 130–131.
CrossRef | PubMed | open url image1

Hillis DM, Huelsenbeck JP, Cunningham CW (1994) Application and accuracy of molecular phylogenies. Science 264, 671–677.
CrossRef | PubMed | open url image1

Huelsenbeck JP (1997) Is the Felsenstein zone a fly trap? Systematic Biology 46, 69–74.
CrossRef | PubMed | open url image1

Hughes T, Hyun Y, Liberles DA (2004) Visualizing very large phylogenetic trees in three dimensional hyperbolic space. BMC Bioinformatics 5, 48–53.
CrossRef | PubMed | open url image1

Huson DH, Nettles SM, Warnow TJ (1999) Disk-covering, a fast-converging method for phylogenetic tree reconstruction. Journal of Computational Biology 6, 369–386.
CrossRef | PubMed | open url image1

Janssen T, Bremer K (2004) The age of major monocot groups inferred from 800+ rbcL sequences. Botanical Journal of the Linnean Society 146, 385–398.
CrossRef | open url image1

Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: the beginning of incongruence? Trends in Genetics 22, 225–231.
CrossRef | PubMed | open url image1

Johnson LAS (1970) Rainbow’s end: the quest for an optimal taxonomy. Systematic Zoology 19, 203–239.
CrossRef | PubMed |
open url image1

Källersjö M, Farris JS, Chase MW, Bremer B, Fay MF, Humphries CJ, Petersen G, Seberg O, Bremer K (1998) Simultaneous parsimony jackknife analysis of 2538 rbcL DNA sequences reveals support for major clades of green plants, land plants, seed plants and flowering plants. Plant Systematics and Evolution 213, 259–287.
CrossRef | open url image1

Keane TM, Page AJ, Naughton TJ, Travers SAA, McInerney JO (2006) Building large phylogenetic trees on coarse-grained parallel machines. Algorithmica 45, 285–300.
CrossRef | open url image1

Kim J (1998) Large-scale phylogenies and measuring the performance of phylogenetic estimators. Systematic Biology 47, 43–60.
CrossRef | PubMed | open url image1

Kluge AG (1989) A concern for evidence and a phylogenetic hypothesis of relationships among Epicrates (Boidae, Serpentes). Systematic Zoology 38, 7–25.
CrossRef | open url image1

Kolaczkowski B, Thornton JW (2004) Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431, 980–984.
CrossRef | PubMed | open url image1

Lassmann T, Sonnhammer ELL (2002) Quality assessment of multiple alignment programs. FEBS Letters 529, 126–130.
CrossRef | PubMed | open url image1

Laurin M, de Queiroz K, Cantino P, Cellinese N, Olmstead R (2005) The PhyloCode, types, ranks and monophyly: a response to Pickett. Cladistics 21, 605–607.
CrossRef | open url image1

Lavin M, Herendeen PS, Wojciechowski MF (2005) Evolutionary rates analysis of Leguminosae implicates a rapid diversification of lineages during the tertiary. Systematic Biology 54, 575–594.
CrossRef | PubMed | open url image1

Leebens-Mack J, Raubeson LA, Cui L, Kuehl JV, Fourcade MH, Chumley TW, Boore JL, Jansen RK, de Pamphilis CW (2005) Identifying the basal angiosperms node in chloroplast genome phylogenies: sampling one’s way out of the Felsenstein zone. Molecular Biology and Evolution 22, 1948–1963.
CrossRef | PubMed | open url image1

Ley RE, Backhed F, Turnbaugh P, Lozupone CA, Knight RD, Gordon JI (2005) Obesity alters gut microbial ecology. Proceedings of the National Academy of Sciences, USA 102, 11 070–11 075.
CrossRef | open url image1

Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar , Buchner A, Lai T, Steppi S, Jobb G, Förster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, König A, Liss T, Lüßmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer K-H (2004) ARB: a software environment for sequence data. Nucleic Acids Research 32, 1363–1371.
CrossRef | PubMed | open url image1

Mabberley DJ (1987) ‘The plant book.’ (Cambridge University Press: Cambridge, UK)

Maddison DR , Schulz K-S (1996–2007) ‘The tree of life web project.’ 2006 http://tolweb.org [verified 17 July 2007].

Maddison WP , Maddison DR (2000) ‘MacClade 4: analysis of phylogeny and character evolution.’ (Sinauer: Sunderland, MA)

Maddison WP , Maddison DR (2007) Mesquite: a modular system for evolutionary analysis. http://mesquiteproject.org/mesquite/mesquite.html [verified 17 July 2007].

McCubbin AG, Roalson EH (2005) Construction of bacterial artificial chromosome libraries for use in phylogenetic studies. Methods in Enzymology 395, 384–400.
CrossRef | PubMed | open url image1

McMahon MM, Sanderson MJ (2006) Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes. Systematic Biology 55, 818–836.
CrossRef | PubMed | open url image1

Minh BQ, Vinh LS, von Haeseler A, Schmidt HA (2005) pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies. Bioinformatics (Oxford, England) 21, 3794–3796.
CrossRef | PubMed | open url image1

Moles A, Ackerly D, Webb C, Tweddle J, Dickie J, Westoby M (2005) A brief history of seed size. Science 307, 576–580.
CrossRef | PubMed | open url image1

Moore B, Smith S, Donoghue MJ (2006) Increasing data transparency and estimating phylogenetic uncertainty in supertrees: approaches using nonparametric bootstrapping. Systematic Biology 55, 662–676.
CrossRef | PubMed | open url image1

Mort ME, Soltis PS, Soltis DE, Mabry ML (2000) Comparison of three methods for estimating internal support on phylogenetic trees. Systematic Biology 49, 160–171.
CrossRef | PubMed | open url image1

Mossel E (2007) Distorted metrics on trees and phylogenetic forests. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4, 108–116.
CrossRef | open url image1

Mossel E , Steel M (2005) How much can evolved characters tell us about the tree that generated them? In ‘Mathematics of evolution and phylogeny’. (Eds O Gascuel, M Steel) pp. 384–412. (Oxford University Press: New York)

Mower JP, Stefanovic S, Young GJ, Palmer JD (2004) Plant genetics—Gene transfer from parasitic to host plants. Nature 432, 165–166.
CrossRef | PubMed | open url image1

Munzner T (1998) Exploring large graphs in 3D hyperbolic space. IEEE Computer Graphics and Applications 18, 18–23.
CrossRef | open url image1

Munzner T, Guimbretiere F, Tasiran S, Zhang L, Zhou YH (2003) TreeJuxtaposer: scalable tree comparison using Focus+Context with guaranteed visibility. ACM Transactions on Graphics 22, 453–462.
CrossRef | open url image1

Myers DS, Cummings MP (2003) Necessity is the mother of invention: a simple grid computing system using commodity tools. Journal of Parallel and Distributed Computing 63, 578–589.
CrossRef | open url image1

Nilsson RH, Rajashekar B, Larsson KH, Ursing BM (2004) GalaxieEST: addressing EST identity through automated phylogenetic analysis. BMC Bioinformatics 5,
CrossRef | PubMed | open url image1

Page RDM (1998) GeneTree: comparing gene and species phylogenies using reconciled trees. Bioinformatics 14, 819–820.
CrossRef | PubMed | open url image1

Page RDM, Charleston MA (1998) Trees within trees: phylogeny and historical associations. Trends in Ecology & Evolution 13, 356–359.
CrossRef | open url image1

Parmentier G, Trystram D, Zola J (2006) Large scale multiple sequence alignment with simultaneous phylogeny inference. Journal of Parallel and Distributed Computing 66, 1534–1545.
CrossRef |
open url image1

Qiu Y-L, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis M, Zimmer EA, Chen Z, Savolainen V, Chase MW (1999) The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature 402, 404–407.
CrossRef | PubMed | open url image1

Qiu YL, Dombrovska O, Lee J, Li L, Whitlock BA, Bernasconi-Quadroni F, Rest JS, Davis CC, Borsch T, Hilu KW, Renner SS, Soltis DE, Soltis PS, Zanis MJ, Cannone JJ, Gutell RR, Powell M, Savolainen V, Chatrou LW, Chase MW (2005) Phylogenetic analyses of basal angiosperms based on nine plastid, mitochondrial, and nuclear genes. International Journal of Plant Sciences 166, 815–842.
CrossRef | open url image1

de Queiroz A, Donoghue MJ, Kim J (1995) Separate versus combined analysis of phylogenetic evidence. Annual Review of Ecology and Systematics 26, 657–681.
CrossRef | open url image1

Rice KA, Donoghue MJ, Olmstead RG (1997) Analyzing large data sets: rbcL 500 revisited. Systematic Biology 46, 554–563.
CrossRef | PubMed | open url image1

Robbertse B, Reeves JB, Schoch CL, Spatafora JW (2006) A phylogenomic analysis of the Ascomycota. Fungal Genetics and Biology 43, 715–725.
CrossRef | PubMed | open url image1

Rokas A, Williams B, King N, Carroll S (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 798–804.
CrossRef | PubMed | open url image1

Ross HA, Lento GM, Dalebout ML, Goode M, Ewing G, McLaren P, Rodrigo AG, Lavery S, Baker CS (2003) DNA Surveillance: web-based molecular identification of whales, dolphins and porpoises. Journal of Heredity 94, 111–114.
CrossRef | PubMed | open url image1

Rutschmann F (2006) Molecular dating of phylogenetic trees: a brief review of current methods that estimate divergence times. Diversity & Distributions 12, 35–48.
CrossRef | open url image1

Salamin N, Hodkinson TR, Savolainen V (2002) Building supertrees: an empirical assessment using the grass family (Poaceae). Systematic Biology 51, 136–150.
CrossRef | PubMed | open url image1

Salamin N, Chase MW, Hodkinson TR, Savolainen V (2003) Assessing internal support with large phylogenetic DNA matrices. Molecular Phylogenetics and Evolution 27, 528–539.
CrossRef | PubMed | open url image1

Sanderson MJ (2006) Paloverde: an OpenGL 3D phylogeny browser. Bioinformatics 22, 1004–1006.
CrossRef | PubMed |
open url image1

Sanderson MJ, McMahon MM (2007) Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evolutionary Biology Suppl. 1 7, S3.
CrossRef | PubMed | open url image1

Sanderson MJ, Wojciechowski MF (2000) Improved bootstrap confidence limits in large-scale phylogenies, with an example from Neo-Astragalus (Leguminosae). Systematic Biology 49, 671–685.
CrossRef | PubMed | open url image1

Sanderson MJ, Wojciechowski MF, Hu JM, Khan TS, Brady SG (2000) Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Molecular Biology and Evolution 17, 782–797.
PubMed |
open url image1

Sanderson MJ, Driskell AC, Ree RH, Eulenstein O, Langley S (2003) Obtaining maximal concatenated phylogenetic data sets from large sequence databases. Molecular Biology and Evolution 20, 1036–1042.
CrossRef | PubMed |
open url image1

Sanderson MJ , Ané C , Eulenstein O , Fernandez-Baca D , Kim J , McMahon MM , Piaggio-Talice R (2007) Fragmentation of large data sets in phylogenetic analysis. In ‘Mathematics of evolution and phylogeny II’. (Eds O Gascuel, M Steel) (Oxford University Press: Oxford)

Schlueter JA, Dixon P, Granger C, Grant D, Clark L, Doyle JJ, Shoemaker RC (2004) Mining EST databases to resolve evolutionary events in major crop species. Genome 47, 868–876.
CrossRef | PubMed | open url image1

Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502–504.
CrossRef | PubMed | open url image1

Semple C, Daniel P, Hordijk W, Page RDM, Steel M (2004) Supertree algorithms for ancestral divergence dates and nested taxa. Bioinformatics 20, 2355–2360.
CrossRef | PubMed | open url image1

Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection. Systematic Biology 51, 492–508.
CrossRef | PubMed | open url image1

Sneath P , Sokal R (1973) ‘Numerical taxonomy.’ (WH Freeman and Co.: San Francisco)

Soltis DE, Soltis PS, Nickrent DL, Johnson LA, Hahn WJ, Hoot SB, Sweere JA (1997) Angiosperm phylogeny inferred from 18S ribosomal sequences. Annals of the Missouri Botanical Garden 84, 1–49.
CrossRef | open url image1

Soltis PS, Soltis DE, Wolf PG, Nickrent DL, Chaw S-M, Chapman RL (1999) The phylogeny of land plants inferred from 18S rDNA sequences: pushing the limits of rDNA signal? Molecular Biology and Evolution 16, 1774–1784.
PubMed |
open url image1

Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690.
CrossRef | PubMed | open url image1

Stamatakis A, Ludwig T, Meier H (2005) RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456–463.
CrossRef | PubMed | open url image1

Storm CEV, Sonnhammer ELL (2002) Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics 18, 92–99.
CrossRef | PubMed | open url image1

Tehler A, Little DP, Farris JS (2003) The full-length phylogenetic tree from 1551 ribosomal sequences of chitinous fungi. Mycological Research 107, 901–916.
CrossRef | PubMed | open url image1

Till M, Zhou BB, Zomaya A, Jermiin LS (2004) Phylogenetic analysis using maximum likelihood methods in homogeneous parallel environments. Lecture Notes in Computer Science 3320, 274–279. open url image1

de la Torre J, Egan M, Katari M, Brenner E, Stevenson D, Coruzzi G, Desalle R (2006) ESTimating plant phylogeny: lessons from partitioning. BMC Evolutionary Biology 6, 48.
CrossRef | PubMed | open url image1

Vilgalys R (2003) Taxonomic misidentification in public DNA databases. New Phytologist 160, 4–5.
CrossRef | open url image1

Vogl C, Badger J, Kearney P, Li M, Clegg M, Jian T (2003) Probabilistic analysis indicates discordant gene trees in chloroplast evolution. Journal of Molecular Evolution 56, 330–340.
CrossRef | PubMed | open url image1

Walters JD, Casavant TL, Robinson JP, Bair TB, Braun TA, Scheetz TE (2005) XenoCluster: a grid computing approach to finding ancient evolutionary genetic anomalies. Lecture Notes in Computer Science 3606, 355–366. open url image1

Webb CO, Donoghue MJ (2005) Phylomatic: tree assembly for applied phylogenetics. Molecular Ecology Notes 5, 181–183.
CrossRef | open url image1

Webb CO, Losos JB, Agrawal AA (2006) Integrating phylogenies into community ecology. Ecology 87, S1–S2.
CrossRef | open url image1

Wojciechowski MF , Sanderson MJ , Steel KP , Liston A (2000) Molecular phylogeny of the ‘temperate herbaceous tribes’ of papilionoid legumes: a supertree approach. In ‘Advances in legume systematics’. (Eds PS Herendeen, A Bruneau) pp. 277–298. (Royal Botanic Gardens, Kew: London)

Yan CH, Burleigh JG, Eulenstein O (2005) Identifying optimal incomplete phylogenetic data sets from sequence databases. Molecular Phylogenetics and Evolution 35, 528–535.
CrossRef | PubMed | open url image1

Yang ZH, Rannala B (2006) Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Molecular Biology and Evolution 23, 212–226.
CrossRef | PubMed | open url image1

Yesson C, Culham A (2006) A phyloclimatic study of cyclamen. BMC Evolutionary Biology 6, 72.
CrossRef | PubMed |
open url image1

Zwickl DJ (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. PhD Dissertation, University of Texas at Austin, Austin, TX.









1 1Reflecting on one of many raging arguments over phenetic systematics in the late 1960s, L.A.S. Johnson argued that problems of homology (‘matching’) would not all be whisked away by large oceans of data: ‘...even if we knew the entire nucleotide sequences over a set of organisms we should still have to make many decisions on matching...’ (Johnson 1970: p. 227, based on his presidential address for the Linnean Society of New South Wales in 1968). At the time the prospects for studying such complete genome sequences must have seemed remote. Now the data are here, and the newest genomics technologies (e.g. 454 Life Sciences’s FLX system) promise to deliver 100 million base pairs of sequence in an eight hour run (50 chloroplast genomes or one entire Arabidopsis genome...). However, the number of ‘decisions’ to be made regarding the analysis of such data has grown along with the quantity of information.


Export Citation Cited By (14)