Abstract | Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. |
Abstract | An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. |
Abstract | Given data that undenNent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. |
Author Summary | We consider here an elementary question for the inference of phylogenetic networks: what networks can be reconstructed. |
Author Summary | Indeed, whereas in theory it is always possible to reconstruct a phylogenetic tree, given sufficient data for this task, the same does not hold for phylogenetic networks: most notably, the relative order of consecutive reticulate events cannot be determined by standard network inference methods. |
Author Summary | Here we propose limiting the space of reconstructible phylogenetic networks to what we call “canonical net-works”. |
Introduction | Explicit [1] or evolutionary [2, 3] phylogenetic networks are used to represent the evolution of organisms or genes that may inherit genetic material from more than one source. |
Introduction | They are called “explicit” to distinguish them from “implicit” [14], “abstract” [1] or “data-display” [3] phylogenetic networks, which are used to display collections of alternative evolutionary hypotheses supported by conflicting signals in the data. |
Introduction | This observation gives rise to the notion of trees displayed by a network, which are all the possible single-character histories implied by a phylogenetic network. |
Abstract | The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. |
Abstract | Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. |
Abstract | Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information. |
Applications | 2B we show a phylogenetic tree of the OTUs present in the samples, with those included in the 30 most relevant clades identified by PhyloRelief highlighted (in red OTUs more prevalent in Malawi, Burkina Faso and Venezuela, in green OTUs more prevalent in the USA and Italy). |
Author Summary | Here we present PhyloRelief, a novel feature-ranking algorithm that fills this gap by integrating the phylogenetic relationships amongst the taxa into a statistical feature weighting procedure. |
Discussion | PhyloRelief is an algorithm that resolves the problem of relevant taxa identification by applying the Relief strategy of feature ranking in a phylogenetic context. |
Introduction | Given that the sequences of marker genes are available, phylogenetic measures of diversity such as UniFrac [19,20] have proven to be able to identify subtle differences in the structures of microbial communities by weighting species abundances with the phylogenetic relationships amongst taxa. |
Predictivity of the ranked features in supervised classification problems | Identifying a ranking strategy to reduce the dimensionality of the dataset can improve the effectiveness of classification algorithms in metagenomic datasets, where correlations between the variables are introduced both by the phylogenetic relationships between the clades and by the fact that relative abundances are measured. |
Results | PhyloRelief is an algorithm that introduces the Relief [21,22] strategy of feature weighting in a phylogenetic context to identify those OTUs or groups of OTUs that are responsible for the differentiation between classes of samples (Le. |
Results | The process requires that the samples are unambiguously classified into cases and controls according to the description provided by the study design, and that a phylogenetic tree of the OTUs has been obtained by molecular phylogenetic analysis. |
Results | the fraction of the phylogenetic tree from which descend only OTUs belonging to one of the classes; b) a weighted update function, in which each branch of the tree is weighted by a quantity proportional to its unbalance between the classes, Le. |
Discussion | The viral population within each patient has descended from founder viruses and the population at the time of sampling may have some background correlation due to phylogenetic similarity. |
Discussion | confounded by such phylogenetic effects [40—44] , and a large literature has developed to account for such biases [44,45]. |
Discussion | Firstly, strong selection pressure can create the environment for convergent evolution in which covariation dominates over phylogenetic effects [42,46,47]; indeed, drug resistance selection from reverse transcriptase (RT) inhibitors has been reported to generate a higher evolution rate in RT, thus f1Xing mutations, as compared to viral genes not under not under drug selection, such as envelope [48]. |
Phylogenetic correction to MI | Phylogenetic correction to MI |
Phylogenetic correction to MI | We recognize that the Mutual Information (MI) does not account for correlations which arise from phylogenetic relationships among the population of interest. |
Phylogenetic correction to MI | In this specific study, where there is the population within each patient and the combined population of all patients, any phylogenetic correction to MI will only reduce phylogenetic influence in the combined population. |