Least squares inference in phylogeny

From Wikipedia, the free encyclopedia

Least squares inference in phylogeny generates a phylogenetic tree based on an observed matrix of pairwise genetic distances and optionally a weight matrix. The goal is to find a tree which satisfies the distance constraints as best as possible.

Contents

The discrepancy between the observed pairwise distances Dij and the distances Tij over a phylogenetic tree (i.e. the sum of the branch lengths in the path from leaf i to leaf j) is measured by

S = wij(DijTij)2
ij

where the weights wij depend on the least squares method used. Least squares distance tree construction aims to find the tree (topology and branch lengths) with minimal S. This is a non-trivial problem. It involves searching the discrete space of unrooted binary tree topologies whose size is exponential in the number of leaves. For n leaves there are 1 • 3 • 5 • ... • (2n-3) different topologies. Enumerating them is not feasible already for a small number of leaves. Heuristic search methods are used to find a reasonably good topology. The evaluation of S for a given topology (which includes the computation of the branch lengths) is a linear least squares problem. There are several ways to weight the squared errors (DijTij)2, depending on the knowledge and assumptions about the variances of the observed distances. When nothing is known about the errors, or if they are assumed to be independently distributed and equal for all observed distances, then all the weights wij are set to one. This leads to an ordinary least squares estimate. In the weighted least squares case the errors are assumed to be independent (or their correlations are not known). Given independent errors, a particular weight should ideally be set to the variance of the corresponding distance estimate. Sometimes the variances may not be known, but they can be modeled as a function of the distance estimates. In the Fitch and Margoliash method [1] for instance it is assumed that the variances are proportional to the squared distances.

The ordinary and weighted least squares methods described above assume independent distance estimates. If the distances are derived from genomic data their estimates covary, because evolutionary events on internal branches (of the true tree) can push several distances up or down at the same time. The resulting covariances can be taken into account using the method of generalized least squares, i.e. minimizing the following quantity

wij,kl(DijTij)(DklTkl)
ij,kl

where wij,kl are the entries of the inverse of the covariance matrix of the distance estimates.

  • PHYLIP, a freely distributed phylogenetic analysis package containing an implementation of the weighted least squares method
  • PAUP, a similar package available for purchase
  • Darwin, a programming environment with a library of functions for statistics, numerics, sequence and phylogenetic analysis

  1. ^ Fitch WM, Margoliash E. (1967). Construction of phylogenetic trees. Science 155: 279-84.
Topics in phylogenetics
v  d  e
Relevant fields: phylogenetics | computational phylogenetics | molecular phylogeny | cladistics
Basic concepts: synapomorphy | phylogenetic tree | phylogenetic network | long branch attraction
Phylogeny inference methods: maximum parsimony | maximum likelihood | neighbour joining | UPGMA | Bayesian inference | Least Squares
Current topics: PhyloCode | DNA barcoding
List of evolutionary biology topics
Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.