Phylogenetic trees reconstructed from molecule sequences are often considered much more reliable 보다 those reconstructed from morphological characters, in part because convergent evolution, which confounds phylogenetic reconstruction, is believed to it is in rarer because that molecular sequences 보다 for morphologies. However, no the validity of this belief nor the underlying reason is known. Below comparing thousands of characters of each form that have actually been provided for inferring the phylogeny the mammals, we uncover that on typical morphological personalities indeed endure much much more convergences 보다 amino acid sites, yet this disparity is defined by fewer states per character fairly than one intrinsically greater susceptibility to convergence because that morphologies 보다 sequences. We display by computer system simulation and actual data evaluation that a simple technique for identifying and also removing convergence-prone characters improves phylogenetic accuracy, perhaps enabling, when necessary, the consists of morphologies and hence fossils for trustworthy tree inference.

You are watching: Compare and contrast molecular and morphological characters

Having a reliable types tree is prerequisite for knowledge evolution, i m sorry is crucial for making sense of essentially every biological phenomenon. Traditionally, types trees space inferred utilizing morphological, physiological or behavioural characters, jointly called morphological characters hereinafter. The arrival of molecule biology supplied countless molecular personalities in the form of DNA and also protein sequences, which are regularly (although not universally) considered an ext suitable than morphological personalities for phylogenetic inference1,2,3,4,5,6. A major reason the this consideration pertains to convergence, which describes repeated origins of the very same character state in multiple evolutionary lineages and also is a primary source of error in phylogenetic reconstruction. Compared with morphological characters, molecular personalities are thought by many (but not all) to be much less susceptible come convergence1,3,4,5,7,8,9,10,11,12. Nevertheless, this belief shows up to arise in the early days of molecular systematics once morphological convergence had long been recognized while molecule convergence had not. Current genetic and genomic studies, however, revealed a large number of convergences in protein succession evolution13,14,15,16,17,18,19,20,21,22,23, casting a doubt top top the over belief. Determining even if it is morphological personalities are an ext prone come convergence than molecular characters is crucial for numerous reasons. First, back morphological and molecular tree are often concordant through each other, this is not always the case2,5,6,24,25. Knowledge of the relative ubiquity of convergence in the two types of personalities helps decide which tree is more trustable. Furthermore, it help decide whether full evidence trees reconstructed jointly indigenous the two types of characters5,9,10,25,26,27,28 are preferred over trees based on any one type. Second, phylogenetic evaluation that has fossils can assist understand evolutionary relation, time and process for fossils as well as extant species2,5,9,10,12,25,26,29. Since molecular personalities are inaccessible in the vast majority of fossils, learning the frequency the morphological convergence is an important to assessing the integrity of phylogenies including fossils. Third, convergence is resulted in by either repeated adaptations of different evolutionary lineages to similar environmental obstacles or chance. Recent studies said that many molecular convergence occasions are attributable come chance18,19,30,31. A comparison between morphological and molecular personalities may administer information about the family member roles of selection and drift in morphological evolution.

Because no all morphological or molecular personalities are work by phylogeneticists, a fair comparison between the two character species in the paper definition of phylogenetics must concentrate on personalities used for phylogenetic reconstruction. To this end, we analysed a big data set including 3,414 parsimony much information morphological characters and 5,722 parsimony informative amino acid sites that was previously compiled because that the inference that mammalian phylogeny the 46 extant and also 40 fossil species25. Our evaluation focused on extant types because they have actually both species of characters. We uncovered that morphological characters experience much more convergences than molecular characters. We devised a method to identify and also remove convergence-prone characters, enabling the consists of morphologies and also hence fossils for reliable tree inference.

Whole-tree analysis

Analysing personality convergence needs a varieties tree. Because the mammalian tree is not completely resolved, we provided three trees, respectively, rebuilded using the morphological characters, molecular characters, and also both varieties of characters in the data set. Under every tree, we inferred the genealogical states in ~ all interior nodes for each character. For every pair of elevation branches, us identified personalities that verified convergence (Fig. 1a; view Methods) and also compared the mean number of convergences every character between morphological and also molecular characters. Because that example, under the morphological tree, the exterior branches, respectively, bring about wolf (Canis lupus) and also aardvark (Orycteropus afer) form an elevation branch pair (Supplementary Fig. 1a), wherein 0.0072 convergences per morphological character to be observed, considerably exceeding the (0.0038) per molecular personality (P=0.03, Fisher’s precise test; view Methods). Amongst 3,396 bag of independent branches in the morphological tree, 79.1% exhibition a greater convergence per morphological character 보다 that every molecular character (Fig. 1b), considerably exceeding the opportunity expectation (P−4, bootstrap test; watch Methods). There room 645 branch pairs with significantly higher per personality morphological convergence 보다 molecular convergence (Q-value Fig. 1b), whereas the the contrary is true for only 61 branch pairs (orange dots in Fig. 1b). The mean number of convergence per morphological character is 1.7 times that per molecular character.


(a) Schematic instances of convergence and divergence. Given the says of the interior and also exterior nodes the the tree, the blue and green branch pairs each skilled a convergence event, when the orange branch pair experienced a aberration event. A, L, N and also V are four various states that a character. (b) Mean variety of convergences every morphological character and that every molecular character for each branch pair examined under the morphological tree. (c) Convergence/divergence (Cv/Dv) proportion for each branch pair under the morphological tree. (d) Mean variety of convergences per morphological character and also that per molecular character for each branch pair check under the molecular tree. (e) Cv/Dv ratio for every branch pair under the molecule tree. In be, each period represents a branch pair. In the grey box of every panel, ‘total’ describes the numbers of dots above and below the diagonal (dots ~ above the diagonal are not counted), respectively, and ‘significant’ refers to the number of dots considerably (at Q-value of 0.05) above (blue) and also below (orange) the diagonal, respectively. Total number of dots above the diagonal substantially exceeds that below the diagonal in be (P−4, bootstrap test). Because that c and also e, branch pairs with limitless Cv/Dv values room not plotted but included in the comparison.

It to be proposed that convergence is an ext fairly compared among characters or branch bag by the ratio between the number of convergence and that of aberration (Cv/Dv; Fig. 1a)15,30 since both Cv and also Dv rise with the amount of evolution. Hence, we established divergence events for every branch pair (see Methods) and then calculation the total number of convergence occasions relative come the total number of divergence events for the branch pair because that each type of characters. We discovered that morphological characters exhibit overwhelmingly bigger Cv/Dv, contrasted with molecular characters (Fig. 1c). The average Cv/Dv proportion of morphological characters is 4.0 times that of molecule characters.

If the morphological tree supplied differs from the unknown true tree, inferring convergence under the morphological tree underestimates morphological convergence and also hence the conclusion the a higher convergence because that morphological characters than molecular personalities should it is in conservative. As expected, when the over analyses were recurring under the molecular tree (Supplementary Fig. 1b) or the complete evidence tree (Supplementary Fig. 1c), we uncovered even higher convergences (Fig. 1d; Supplementary Fig. 2a) and Cv/Dv ratios (Fig. 1e; Supplementary Fig. 2b) for morphological personalities than for molecular characters. Comparable results were derived using conventional procedures of homoplasy such as the consistency index (ci) and also rescaled consistency table of contents (rc). That is, regardless of the tree topology used, morphological characters show reduced ci and rc, thus higher homoplasy, 보다 molecular personalities (Supplementary Fig. 3).

DNA sequences instead of amino mountain sequences are periodically used as molecular characters in phylogenetics. We, therefore, also conducted a whole-tree evaluation of the 19,227 parsimony much information nucleotide sites in the data set, v the tree inferred native the nucleotide sequences together the molecule tree. Regardless of whether the morphological or molecular tree is used, we observed greater convergence per character and higher Cv/Dv proportion for morphological personalities than nucleotide web page (Supplementary Fig. 4a–d).

Quartet analysis

Because the true mammalian tree is unknown, come ensure a same comparison between morphological and also molecular characters, we more examined every four species in the data that show the very same phylogenetic relationship in the morphological and molecular trees, i beg your pardon we describe as quartets (Fig. 2a). Offered a quartet and also their phylogenetic relationship, a parsimony-informative character is claimed to be convergent if at the very least two transforms are forced to describe the observed says (Fig. 2a). We established all convergence occasions for each quartet. Averaged across 7,146 quartets that have the right to be examined, we observed 0.026 convergences every morphological character, i m sorry is three times the per molecular character (0.0085). Greater morphological convergence than molecular convergence is found in 93.9% the quartets (Fig. 2b), significantly exceeding the opportunity expectation (P−4, bootstrap test). A full of 6,087 quartets present significantly higher per character morphological convergence than molecular convergence (Q-value Fig. 2b).


(a) A schematic example of a quartet, which space four species (2, 3, 5 and 6) reflecting the same phylogenetic relationship in the morphological (left) and molecular (right) trees. Examples of character claims exhibiting convergence and consistency space shown. (b) Mean variety of convergences every morphological character and that every molecular character because that each quartet examined. (c) Convergence/consistency (Cv/Cs) ratio for every quartet. In b and also c, each period represents a quartet. In the grey crate of every panel, ‘total’ refers to the number of dots over and below the diagonal line (dots top top the diagonal space not counted), respectively, and ‘significant’ describes the numbers of dots significantly (at Q-value the 0.05) over (blue) and below (orange) the diagonal, respectively. Total number of dots over the diagonal substantially exceeds that below the diagonal line in b and c (P−4, bootstrap test). In c, quartets with infinite Cv/Cs values are not plotted but included in the comparison.

Given a quartet and also their phylogenetic relationship, a parsimony-informative personality is stated to be continual when just one readjust is needed to define the observed states. Convergence offers an erroneous phylogenetic signal because that the quartet, whereas consistency provides the exactly signal. We for this reason computed, because that each quartet, the ratio in between the total variety of convergences and also that that consistencies (Cv/Cs ratio) because that each form of characters, which may be perceived as the noise/signal ratio. Again, morphological characters tend to have higher Cv/Cs ratios than molecular personalities (Fig. 2c). The above results also hold when nucleotide sites instead of amino acid sites are provided as molecular characters (Supplementary Fig. 4e,f).

Number of states per character

We discovered that 75.2% of parsimony-informative morphological personalities are binary (Fig. 3a). Because binary personalities can only have actually one kind of change given an ancestral state, that is noticeable that they space susceptible to convergence as soon as multiple transforms occur. Through contrast, only a small portion (12.4%) that molecular characters are binary (Fig. 3a). The median number of states is five for molecule characters, significantly higher than that (two) because that morphological characters (P−300, Mann–Whitney U-test).


(a) Frequency distribution of the variety of states every character. (b) Cv/Dv ratio of a personality decreases as the variety of states increases. Cv/Dv proportion of a character is the amount of convergences throughout all branch pairs divided by that of divergences. The top and also bottom edges of a box represent the first and third quartiles that the distribution, respectively, when the thick heat inside package represents the median. The two whiskers show the maximum value not better than the an initial quartile add to 1.5 times package height and also the minimum worth not smaller sized than the third quartile minus 1.5 times package height, respectively. Cv/Dv ratios room calculated under the morphological tree. The very same pattern is observed once Cv/Dv ratios are calculated under the molecular tree (Supplementary Fig. 5).

The probability of convergence loved one to that of aberration for a personality is supposed to decrease with the variety of states. Indeed, the Cv/Dv proportion decreases with the number of states for both varieties of characters (Fig. 3b; Supplementary Fig. 5) and also this trend stays after the manage of evolutionary rate (Supplementary Table 1). We estimated that the Cv/Dv ratio of an mean morphological character is 0.89 times that of a molecular character with the same variety of states, and also the equivalent number is 0.55 for Cv/Cs (see Methods). These results indicate that, compared with molecular characters, the greater convergence of morphological personalities is resulted in by having actually fewer states rather than intrinsically greater susceptibilities to adaptive convergent evolution, because morphological characters are no much more prone come convergence than molecular personalities once the number of states is controlled for.

The over patterns stay unchanged also when nucleotide sites instead of amino mountain sites are supplied as molecular personalities (Supplementary Table 2). Interestingly, return there deserve to be no much more than four states at each nucleotide site, the median number of states (three) every nucleotide site is tho significantly greater than the (two) per morphological personality (P−300).

Removing convergence-prone personalities improves phylogenetics

Because the vast bulk of molecular convergences space explainable through chance18,19,30,31, the fact that mean morphological personalities have even smaller Cv/Dv and also Cv/Cs ratios than those the molecular characters of the very same numbers that states suggest that most morphological convergences it was observed in the data analysed space probably additionally attributable come chance. If convergence is owing to possibility rather than lineage-specific selection, that is possible to identify and also remove convergence-prone personalities using species with reliable phylogenetic relations and also then infer the tree for types of uncertain relationships using the continuing to be characters. This technique would be especially advantageous to phylogenetic inference that has morphological data due to the fact that of the reasonably frequent convergence in together data. Us propose the complying with procedure as soon as analysing a data set with both morphological and molecular characters. First, us infer the morphological and molecular trees separately. Second, quartets (that is, groups of four species with the same phylogenetic relations in the two trees) are identified and the Cv/Cs proportion is calculated based upon these quartets for each character. Third, we remove all personalities whose Cv/Cs ratio exceeds a cutoff and also infer the tree making use of all remaining morphological and also molecular characters combined.

To investigate whether the above approach enhances phylogenetic accuracy, we carried out 50 simulations that mammalian morphological and molecular characters based on their respective empirical distribution of the number of states (Supplementary Fig. 6a). Quartet evaluation demonstrates the the simulated data have similar properties together the genuine data (Supplementary Fig. 6b,c; Supplementary Table 3). Us measured the Robinson-Foulds distance (dRF) between an inferred tree and the recognized true tree in simulation; dRF is twice the portion of branch partitions the differ between the two trees32; the smaller sized the dRF, the an ext accurate the inferred tree. We discovered that dRF is substantially greater for the 50 morphological trees than the 50 molecular tree (P=1.6 × 10−14, Mann–Whitney U-test), confirming the damage of random convergence on phylogenetic accuracy. We collection 10 Cv/Cs cutoffs from 5 to 0.03 and also inferred ten low-convergence complete evidence trees because that each simulated data collection using the over proposed procedure (see Methods). We found that dRF come the true tree is typically smaller for low-convergence trees than the initial tree reconstructed using all characters (green symbols in Fig. 4a), and the innovation in phylogenetic accuracy plateaus as soon as the cutoff get 0.3. Through contrast, trees based upon a arbitrarily removal that the same number of characters do not display smaller dRF when compared with the original tree (pink symbols in Fig. 4a). Together expected, the mean variety of states is higher for the remaining low-convergence personalities than for the same variety of characters randomly choose from the initial simulated data (Supplementary Fig. 6d).


(a) Simulation results reflecting that using personalities with Cv/Cs ratios below specific cutoffs reduces the Robinson-Foulds street (dRF) in between the true tree and also the inferred tree, while using the same variety of randomly picked characters does not. The top and also bottom edge of a box, respectively, stand for the an initial and third quartiles of the distribution from 50 simulations, if the thick heat inside the box represents the median. The two whiskers show the maximum value not greater than the very first quartile plus 1.5 times package height and the minimum value not smaller than the third quartile minus 1.5 times package height, respectively. Cv/Cs ratios space estimated based upon quartets (sets the four varieties with the same phylogenetic relationship in the inferred morphological and molecular tree of the simulated data). *PU-test from 50 simulations; **PCv/Cs ratios Pteropus giganteus and the echolocator Rhinopoma hardwickii.

Removing convergence-prone characters transforms the bat tree

We applied the over pipeline come the mammalian data collection including both morphological characters and amino acid sequences. The very same 10 Cv/Cs ratio cutoffs as in the simulation were provided in removed high-convergence characters, and low-convergence full evidence trees of all 86 extant and also fossil varieties were inferred using the continuing to be morphological and also molecular characters. For the 46 extant species that have the right to be compared, the resultant low-convergence trees are generally more similar than trees based on the very same numbers that randomly selected characters to the initial molecular tree (Supplementary Fig. 7). The low-convergence tree are additionally generally much more different 보다 trees based upon the exact same numbers that randomly selected characters from the original morphological tree (Supplementary Fig. 7). Although the true mammalian tree is unknown, these monitorings are constant with ours finding the convergence is less frequent in molecular personalities than morphological characters.

Regarding intra-order relationships, the phylogeny the bats has been extremely controversial. Specifically, every echolocating bats typically type a monophyletic group in morphological trees, saying a single origin the bat echolocation25. Yet they tend to kind a paraphyly in molecular trees33,34,35,36, suggesting the possibility of two origins of bat echolocation or one origin adhered to by a loss. In the original complete evidence tree (Supplementary Fig. 8a) reconstructed using the data analysed here, all 5 extant varieties of echolocating bats type a monophyly come the exemption of the only non-echolocating extant bat Pteropus giganteus, v a 99.2% bootstrap support (Fig. 4b). As soon as the 3,930 personalities (1,007 morphological and 2,923 molecular) v Cv/Cs ratio Supplementary Fig. 8b), echolocating bats become paraphyletic; the echolocating Rhinopoma hardwickii and non-echolocating P. Giganteus room grouped with a 95.0% bootstrap assistance (Fig. 4c). Note that utilizing low-convergence morphological personalities alone walk not an outcome in this new topology. For comparison, we generated 50 randomly subsampled data sets, each v 1,007 morphological and 2,923 molecular characters. Return 18 the them also yielded the very same topology as in Fig. 4c, the corresponding bootstrap assistance ranged in between 18 and also 70%, arguing that the solid support for the paraphyly the echolocators in Fig. 4c is not explained simply by subsampling that the original data. Our results are not sensitive to the Cv/Cs proportion cutoff, because the very same bat relationships were recovered when any type of Cv/Cs cutoff that 0.3 or smaller sized was used.

Our evaluation of comparably large numbers that morphological and also molecular personalities previously offered in inferring the mammalian tree proved that morphological characters experienced more convergent development than molecular characters, confirming a long-held belief of the phylogenetics community. Nevertheless, us caution that our conclusion must be more scrutinized using extr data from added groups of species, since they are currently based on just one, albeit really large, data collection of one group of species. There room three potential sources of error in our inference that convergence. First, usage of a wrong species tree could bias our inference. But, as demonstrated, our results are durable to different species trees used. Second, our inference that convergence relies on ancestral state restoration by parsimony that may contain errors37. But, such errors need to be comparable in between the two varieties of characters. Third, it was newly proposed that some inferred convergences may be brought about by incomplete lineage sorting quite than real convergent changes38. Comparable to real convergence, noticeable convergence fan to incomplete lineage sorting also confounds phylogenetic inference and also thus require not it is in separated from our estimates of convergence. Hence, the 3 potential errors carry out not affect our conclusion.

Regarding the factor behind the greater convergence of morphological personalities than molecular characters, ours results perform not support the common view that morphological personalities are intrinsically much more prone to convergence due to the fact that they are much more frequently topic to hopeful selection. Instead, we discovered the probability that convergence because that a character to decrease v the number of states and also found no better intrinsic propensities because that convergence (as measured by Cv/Dv and Cv/Cs ratios) among morphological characters than molecular characters after the regulate of the number of states. A likely explanation for this unexpected finding is the phylogeneticists have actually removed morphological personalities that room subject to constant positive an option (for example, human body size and also coat colour) native phylogenetic analysis, due to the fact that such personalities are known to absence reliable phylogenetic signals39. Together a result, the morphological characters used for phylogenetic inference have fairly low intrinsic propensities for convergence. If many convergences that the morphological characters in the data analysed room not manifestations of recurring adaptations but pure chance, one wonders what morphological characters are responsible for the clustering of varieties with watch adaptive convergences in the morphological tree, such together the clade of the 4 ant- and also termite-eaters: the nine-banded armadillo Dasypus novemcinctus, collared anteater Tamandua tetradactyla, Chinese pangolin Manis pentadactyla, and also aardvark Orycteropus afer (Supplementary Fig. 1a). This species form three elevation lineages (Dasypus + Tamandua, Manis, and also Orycteropus) in the molecular tree (Supplementary Fig. 1b) as well as the total evidence tree (Supplementary Fig. 1c). We discovered that, even on the communication of the molecular tree, at most 14 morphological personalities are inferred to have experienced convergence amongst the three lineages, and also the yes, really number is likely much smaller because, for 13 of the 14 characters, convergence is but one of numerous equally parsimonious evolutionary scenarios. However, none of the 14 personalities are apparently pertained to ant- and also termite-eating or are details to these 4 species. Because that instance, the just character because that which the single parsimonious reconstruction suggests convergence amongst the three lineages describes the form of the medial border that humerus trochlea. The humerus is a lengthy bone in the arm or forelimb that runs indigenous the shoulder come the elbow and also trochlea describes a grooved structure reminiscent that a pulley"s wheel. This personality does not show up to be concerned ant- and also termite-eating. In fact, manatee (Trichechus manatus) and also ring-tailed lemur (Lemur catta) likewise have the same state as the 4 ant- and termite-eating mammals because that this character. These findings are constant with our conclusion that most morphological convergences observed here are led to by opportunity rather than recurring adaptations. The course, us cannot exclude, the opportunity that a small number of morphological convergences observed in this data set are adaptive.

Nevertheless, morphological personalities experience an ext convergences than molecular characters, since of lot fewer claims in the previous than the latter. The low number of states every morphological character might be regarded one or both the the following reasons7,10. First, curating multistate morphological characters may be an ext subjective and also error-prone, leading to a diminished use of such characters in phylogenetics40. Second, most morphological characters may have actually a little state space, calculation finding multistate personalities difficult41.

Because of the greater prevalence that convergence amongst morphological characters than molecular characters and also the rapid build-up of molecular succession data, we suggest that phylogenetic repair should generally use just molecular data. In the event that molecule data room inaccessible for some taxa such as fossils, one should consider using morphological characters with relatively big numbers of says to minimize convergence in phylogenetic analysis.

Given a data collection of morphological and molecular characters, us proposed a technique to reconstruct more accurate complete evidence tree by identifying and removing convergence-prone characters in the data set, and also demonstrated its validity by computer system simulation. Homoplasy, which interferes v phylogenetic inference, also includes reversal in enhancement to convergence. While our study concentrates on convergence, it is precious noting that convergence-prone characters are additionally expected to it is in reversal-prone if many convergences space chance occasions owing to the accessibility of only few states, as suggested by the present data. Thus, in removing convergence-prone characters, we effectively also take out countless reversal-prone characters; the success that our technique may it is in in part attributable come this effect. Since our an approach relies top top the presumption that characters that are convergence-prone in the quartets analysed are likewise convergence-prone in other species, it is not effective in removing characters that are convergence-prone in a few specific lineages such as those topic to adaptive convergence. In principle, one could also downweight rather of remove convergence-prone characters, yet the ideal weights are unknown. Future studies have the right to investigate how to gain the ideal weights for improving phylogenetic accuracy.

We showed that the original total evidence mammalian tree in which every echolocating bats kind a monophyly is changed upon the remove of convergence-prone characters. The low-convergence tree shows a paraphyly of echolocating bats, identical to the newly published genome-based bat phylogeny34. Assuming the the genome-based tree is correct, our results demonstrated the energy of our technique in actual phylogenetic inference through the full evidence approach. Besides, our low-convergence tree likewise supports the monophyly the pangolin (Manis pentadactyla) and carnivores (Supplementary Fig. 8b), i m sorry is no reflected in the original total evidence tree (Supplementary Fig. 8a) however is sustained by previous molecular studies33,42. As shown by our computer system simulation, back removing convergence-prone characters improves phylogenetic accuracy, low-convergence trees may still save on computer errors. Identifying and removing convergence-prone characters is through no means a panacea because that phylogenetics. While promptly accumulating genome order will ultimately dwarf the morphological data of any type of extant species, morphological data will certainly remain useful in phylogenetic analysis that needs to save on computer fossils, whose value to understanding evolution is indispensable. Because that this reason, understanding and also remedying convergence, i m sorry is much more prevalent in morphological 보다 molecular characters, will certainly remain an essential task in phylogenetics. That course, morphological characters that have the right to be learned in fossils execute not represent a random sample of all morphological characters. Even if it is this nonrandomness will predisposition phylogenetic inference43 is likewise worth investigation.