Main features

DARwin is organized in several independent and complementary sections which manage their own input data that can be inherited from other sections or directly imported from other sources. Standard data formats are available as input and output.


Various dissimilarity and distance estimations are proposed for different data: quantitative, qualitative, binary, DNA sequence... Properties of dissimilarities are largely explored and transformations are proposed to eventually restore suitable properties.

Factorial analysis

Principal Coordinate analysis produces graphical representations on Euclidean plans which preserve at best the distances between units.

Tree construction

Tree construction methods include hierarchical trees with various aggregation criteria (weighted or unweighted), Neighbor-Joining tree (weighted or unweighted), Scores method. Ordinal extensions of NJTree and Scores attempt to reduce sensibility to data error. NJTree under topological constraints allows forcing the a priori known tree structure of some data subsets. Bootstrapping in NJTree and an original method to detect influential units can be used to estimate how the tree is supported by the data.


Shortcut-edges are added to a tree to account for local horizontal transferts or hybridisations, the diversity structure becoming a network rather than a tree.

Tree representation

Many graphical tools are offered to make graphs easy to read and ready to insert in publication or other document.

Tree comparison

When several data sets are used to construct trees on the same unit set, consensus methods, maximum agreement sub tree, distances between trees are proposed to compare or synthesize these trees.

Sampling for disequilibria

This part is more specific and is devoted to sampling procedures for minimizing spurious linkage disequilibria due to structures in a collection. Two strategies are proposed.