Proportionality: A Valid Alternative to Correlation for Relative Data
David Lovell, Vera Pawlowsky-Glahn, Juan José Egozcue, Samuel Marguerat, Jürg Bähler

Abstract

With such relative—or compositional—data, differential expression needs careful interpretation, and correlation—a statistical workhorse for analyzing pain/vise relationships—is an inappropriate measure of association. Using yeast gene expression data we show how correlation can be misleading and present proportionality as a valid alternative for relative data. We show how the strength of proportionality between two variables can be meaningfully and interpretably described by a new statistic (p which can be used instead of correlation as the basis of familiar analyses and visualisation methods, including co-expression networks and clustered heatmaps. While the main aim of this study is to present proportionality as a means to analyse relative data, it also raises intriguing questions about the molecular mechanisms underlying the proportional regulation of a range of yeast genes.

Author Summary

Correlation is popular as a statistical measure of pairwise association but should not be used on data that carry only relative information. Using timecourse yeast gene expression data, we show how correlation of relative abundances can lead to conclusions opposite to those drawn from absolute abundances, and that its value changes when different components are included in the analysis. Once all absolute information has been removed, only a subset of those associations will reliably endure in the remaining relative data, specifically, associations where pairs of values behave proportionally across observations. We propose a new statistic gb to describe the strength of proportionality between two variables and demonstrate how it can be straightforwardly used instead of correlation as the basis of familiar analyses and visualization methods. This is a PLOS Computational Biology Methods paper.

Introduction

Sometimes, researchers are interested in the relative abundance of different components. Other times, they have to make do with relative abundance to gain insight into the system under study. Whatever the case, data that carry only relative information need special treatment.

Compositional data analysis [4] (CODA) is a valid alternative that harks back to Pear-son’s observation [5] of ‘spurious correlation’, i.e., while statistically independent variables X, Y, and Z are not correlated, their ratios X/Z and Y/Z must be, because of their common divisor. (Note: this differs from the logical fallacy that “correlation implies causation”.) Proportions, percentages and parts per million are familiar examples of compositional data; the fact that the representation of their components is constrained to sum to a constant (i.e., 1, 100, 106) emphasizes that the data carry only relative information. Note that compositional data do not necessarily have to sum to a constant; what is essential is that only the ratios of the different components are regarded as informative.

Problems with correlation can also be demonstrated geometrically (Fig. 1): the bivariate joint distribution of relative abundances says nothing about the distribution of absolute abundances that gave rise to them. Thus, relative data is also problematic for mutual information and other distributional measures of association. To further illustrate how correlation can be misleading we applied it to absolute and relative gene expression data in fission yeast cells deprived of a key nutrient [6].

We show how proportionality provides a valid alternative to correlation and can be used as the basis of familiar analyses and visualizations. We conclude by putting this analysis strategy in perspective, discussing challenges, caveats and issues for further work, as well as the biological questions raised in this study.

Results

Data on absolute mRNA abundance

[6] on the absolute levels of gene expression (i.e., mRNA copies per cell) in fission yeast after cells were deprived of a key nutrient (Fig. 2). Unlike many experiments where researchers ensure (or assume) cells produce similar amounts of mRNA across conditions [7] , this experiment ensured cells produced very different amounts so as to illustrate the merits of absolute quantification (81 F ig.). Total abundance may vary dramatically in other experimental settings—such as in comparing diseased and normal tissues, tissues at different stages of development, or microbial communities in different environments. To illustrate the key points of this paper, we worked with positive data only (i.e., we excluded records with any Zero or NA values): measurements of 3031 components (i.e., mRNAs) at 16 time points. Furthermore, we applied analysis methods (specifically, correlation) to the absolute abundance data without transformation (e.g., taking logarithms) because we believe this approach yields useful insights and simplifies the presentation of the central ideas of this paper (see [8] and 81 Supporting Information). Challenges in interpreting “differential expression” Before looking at issues with pairs of components, it is important to note that interpreting differences in the relative abundance of a single component can be challenging.

Much attention has been given to dealing with small numbers of observations and large numbers of tests, but comparatively little to “. . .the commonly believed, though rarely stated, assumption that the absolute amount of total mRNA in each cell is similar across different cell types or experimental perturbations” [7].

When total absolute abundance of mRNA stays constant, fold changes in both absolute and relative abundance of each mRNA are equal. When total absolute abundance varies, fold changes in absolute and relative abundances of each mRNA are no longer equal and can change in difierent directions. Between 0 and 3 hours there were 1399 yeast mRNAs whose absolute abundance decreased, and whose relative abundance increased. Clearly, mRNAs are being expressed differently, but to describe them as “under- or over-expressed” is too simplistic—here lies the interpretation challenge (see 81 Supporting Information).

Correlations between relative abundances tell us absolutely nothing

While “differential expression” of relative abundances is challenging to interpret, in the absence of any other information or assumptions, correlation of relative abundances is just wrong. We stress in the absence of any other information or assumptions to highlight the common assumption of constant absolute abundance of total mRNA across all experimental conditions. If this assumption holds, and all the mRNAs comprising that total are considered, the relative abundance of each kind of mRNA will be proportional to its absolute abundance, and analyses of correlation or “differential expression” of the relative values will have clear interpretations. The revisitation of this assumption [7] should raise alarm bells about the inferences drawn from many gene expression studies.

1(a) shows why correlation between relative abundances tells us nothing about the relationship between the absolute abundances that gave rise to them: the perfectly correlated relative abundances could come from any set of absolute abundance pairs that lie on the rays from the origin. This many-to-one mapping means that other measures of statistical association (e.g., rank correlations or mutual information) will not tell us anything either when applied to purely relative data.

A rare issue? Consider the red mRNA pair in Fig. 2: while their absolute abundances over time are strongly positively correlated, if someone (inappropriately) used correlation to measure the association between the relative abundances of these two mRNAs they would form the opposite view (Fig. 3(a)); correlation between the blue mRNA pair in Fig. 2 is similarly misleading (82 Fig.). What of the other 4.5 million pairs of mRNAs? Fig. 3(b) summarizes all discrepancies between correlations of absolute abundance, and correlations of relative abundance, showing clearly that the apparent correlations of relative abundances tell a very different story from those of the absolute data. So how should we go about analyzing these relative data? Principles for analyzing relative data CODA theory provides three principles [4, 9]: 2. Subcompositional coherence: inferences about subcompositions (subsets of components) should be consistent, regardless of Whether the inference is based on the subcomposition or the full composition. 3. Permutation invariance: the conclusions of analyses must not depend on the order of the components. Correlation is not subcompositionally coherent: its value depends on Which components are considered in the analysis, e.g., if you deplete the most abundant RNAs from a sample [10] and use correlation to measure association between relative abundances, you get different correla-

Proportionality is meaningful for relative data

Proportionality obeys all three principles for analyzing relative data. If relative abundances x and y are proportional across experimental conditions 1', their absolute abundances must be in

gb is related to logratio variance [4], var(log(x/y)), and is zero when x and y behave perfectly proportionally. However, when x and y are not proportional, gb has both a clear geometric interpretation and a meaningful scale, addressing concerns raised about logratio variance [3]: the closer gb is to zero, the stronger the proportionality. We consider “strength” of proportionality (goodness-of-fit) rather than testing the hypothesis of proportionality because it allows us to compare relationships between different pairs of mRNAs (81 Supporting Information). We calculated gb for the relative abundances of all pairs of mRNAs and compared it to the correlations between their absolute abundances (S4 Fig): clearly, the absolute abundances of most mRNA pairs are strongly positively correlated; far fewer are also strongly proportional. Focusing on these strongly proportional mRNAs, we extracted the 424 pairs with gb < 0.05. We graphed the network of relationships between these mRNAs (SS Fig. ), an approach similar to gene co-eXpression network [12] or weighted gene co-eXpression analysis [13] but founded on proportionality and therefore valid for relative data. The network revealed one cluster of 96, and many other smaller clusters of mRNAs behaving proportionally across conditions. Using gb as a dissimilarity measure, we formed heatmaps of the three largest clusters

similar to the method of Eisen et al. [14] but, again, using proportionality not correlation.

Discussion

This paper does not deny pairwise statistical associations between absolute abundances. What it does say is that once all the absolute information has been removed, only a subset of those associations Will reliably endure in the remaining relative data, specifically, associations Where values behave proportionally across observations.

Other approaches to compositional data in the molecular biosciences

Strategies have been proposed to ameliorate spurious correlation in the analysis of relative abundances [2, 3]. We contend that there is no way to salvage a coherent interpretation of correlations from relative abundances without additional information or assumptions; our argument is based on Fig. 1.

Aitchison articulates problems with this approach [4, p.56—58]. SparCC [3] injects additional information by assuming the number of different components is large and the true correlation network is sparse. This equates to assuming “that the average correlations [between absolute abundances] are small, rather than requiring that any particular correlation be small” [3, Eq. 14]. This means the eXpected value of the total absolute abundance will be constant (as the sum of many independently distributed amounts). We are concerned with situations where that assumption cannot be made, or where the aim is to describe associations between relative amounts.

Caution about correlation

This is highly relevant to gene coexpression networks [12]. Correlation is at the heart of methods like Weighted Gene Co-expression Network Analysis [13] and heatmap visualization [14]. These methods are potentially misleading if applied to relative data. This concern extends to methods based on mutual information (e.g., relevance networks [17]) since, as Fig. 1 shows, the bivariate joint distribution of relative abundances (from which mutual information is estimated) can be quite different from the bivariate joint distribution of the absolute abundances that gave rise to them.

Currently, there are many gene co-expression databases available that provide correlation coefficients for the relative expression levels of different genes, generally from multiple experiments with different experimental conditions (see e.g., [18]). As far as we are aware, none of the database providers explicitly address whether absolute levels of gene expression were constant across experimental conditions. If the answer to this question is “no”, we would not recommend these correlations be used for the reasons demonstrated in this paper. If the answer is “yes” we still advocate caution in applying correlation to absolute abundances for reasons discussed in 81 Supporting Information.

Results in relation to genome regulation in fission yeast

While the main aim of this study is to present and illustrate principles for analyzing relative abundances, it has also uncovered intriguing biological insight with respect to gene regulation.

The absolute levels of these mRNAs decrease after removal of nitrogen [6]. The notable coherence in biological function among the mRNAs in this cluster is higher than typically seen when correlative similarity metrics for clustering are applied (e.g., [19] ). These 96 mRNAs show remarkable proportionality to each other over the entire timecourse (88 Fig), and maintain near constant ratios across all conditions (S9 Fig). Given the huge energy invested by yeast cells for protein translation (most notably ribosome biogenesis [20, 21], it certainly makes sense for cells to synchronize the expression of relevant genes such that translation is finely tuned to nutritional conditions. Evidently, numerous ribosomal proteins and RNAs function together in the ribosome, demanding their coordinated expression; more surprisingly, multiple other genes, with diverse functions in translation, show equally pronounced proportional regulation across the time-course. These findings raise intriguing questions as to the molecular mechanisms underlying this proportional regulation, suggesting sophisticated, coordinated control of numerous mRNAs at both transcriptional and post-transcriptional levels of gene expression.

Challenges and future work

First is the treatment of zeroes, for Which there is currently no simple general remedy [22]. Second, and related, is the fact that “many things that we measure and treat as if they are continuous are really discrete count data, even if only at the molecular extremes” [23] and count data is not purely relative—the count pair (1, 2) carries different information than counts of (1000, 2000) even though the relative amounts of the two components are the same. Correspondence analysis [24], or methods based on count distributions (e.g., logistic regression and other generalized linear models) may provide ways forwards.

Methods

Reproducing this research

All data and code [25] needed to reproduce the analyses and Visualizations set out in this paper are contained in the Supporting Information, along With additional illustrations and detailed explanations.

Measuring proportionality

Aitchison [4] proposed logratio variance, var(log(x/y)), as a measure of association for variables that carry only relative information. When x and y are exactly proportional var(log(x/y)) = 0, but When x and y are not exactly proportional, “it is hard to interpret as it lacks a scale. That is, it is unclear What constitutes a large or small value. . . (does a value of 0.1 indicate strong dependence, weak dependence, or no dependence? )” [3]. Logratio variance can be factored into two more interpretable terms: Where fl is the standardized major axis estimate [26] of slope of random variables logy on log x, and r the correlation between those variables. The first term in Equation 2, var(log x), is solely about the magnitude of variation at play and has nothing to do With y. The second term, gb, describes the degree of proportionality between x and y, and forms the basis of our analysis of the relationships between relative values. Other nonnegative functions of fl and r that are zero When x and y are perfectly proportional could be formed; this is described in more detail in 81 Supporting Information, as well as Why gb is preferable to an hypothesis testing approach. There is no need to calculate fl or r to assess strength of proportionality; they simply provide a clear geometric interpretation of gb; in practice, one can use the relationship gb(log x, logy) = var(log(x/y))/var(log x).

Alternative measures of proportionality

The gb statistic is a measure of goodness-of—fit to proportionality that combines two quantities of interest: fl, the slope of the line best describing the relationship between random variables log x and log y; and r, whose magnitude estimates the strength of the linear relationship between log x and logy. “Goodness-of—fit” describes how well a statistical model fits a set of observations and is a familiar concept in regression, including linear and generalised linear models, but note that gb—specifically the slope (fl) of the standardized major aXis—is motivated by allometry rather than regression modeling. We are interested in assessing whether two variables are directly proportional, rather than predicting one from the other: “use of regression would often lead to an incorrect conclusion about whether two variables are isometric or not” [26, p.265]. Note also that ordinary least squares regression fits are not symmetric: in general, the slope of y regressed on x is different to the slope of x regressed on y [27]. While goodness-of—fit measures for regression may not generally be appropriate for assessing proportionality, Zheng [28] eXplores the concordance correlation coefficient pC [29] which could be modified to provide an alternative measure of proportionality defined as and related to var(log(x/y)) by the terms in Equation 1. This “proportionality correlation coeffi-cient” ranges from —1 (perfect reciprocality) to +1 (perfect proportionality) and lacks the clear geometric interpretation of gb.

Centered Iogratio (clr) representation

However to ensure that the gb values for component pair (1', j) are on the same scale (i.e., comparable to) the gb values for component pair (m, n), it is necessary to use the centered Iogratio (clr) transformation instead of just the logarithm (81 Supporting Information). The clr representation of composition x = (x1, . . ., xi, . . ., xD) is the logarithm of the components after dividing by the geometric mean of x: ensuring that the sum of the elements of clr(x) is zero. Note that dividing all components in a composition by a constant (i.e., the geometric mean gm(x)) does not alter the ratios of components.

Using gb to form co-expression networks and clustered heatmaps

Gene co-eXpression networks [12, 13] are generally based on a pairwise distance or dissimilarity matrix which is often a function of correlation and thus not appropriate for relative data. Proportionality is appropriate, but gb does not satisfy the properties of a distance—most obviously, it is not symmetric unless fl = 1:

This symmetrised form of gb was then used to lay out a network of the 145 mRNAs that were involved in 424 pairwise relationships with gb < 0.05. We used the symmetrised form of gb as the basis of the cluster analysis and heatmap eXpression pattern display (e.g., 810 Fig.) described by Eisen et al. [14].

Supporting Information

Times 0 and 3 are highlighted for further study. (EPS)

Values have been scaled and translated to have zero mean and unit variance. Upper panels show absolute abundances; the lower show relative abundances. The left panels show mRNA values over time; the right show the value of one mRNA plotted against the other at each time point. As With Fig. 3, the correlation between the relative abundances is almost the complete opposite of that between the absolute abundances of this pair of mRNAs.

White contour lines are shown at intervals of 100 counts. While the distribution of the correlation coefficient pairs lies more on the diagonal than in the preceding figure, it is clear that correlation of relative abundances is sensitive to What is in (or out of) the

The red and blue points correspond to the red and blue pairs of mRNA in Fig. 2. White contour lines are shown at intervals of 100 counts and the top marginal histogram is the same as in 82(b) Fig. The few mRNA pairs that are strongly proportional (within the red rectangle) are also strongly positively correlated. However, the converse is not true: strong positive correlation between mRNAs does not imply that they are strongly proportional.

S8 Fig. The relative abundances of each of the mRNAs from the 96 mRNA cluster seen in

The geometric mean at each timepoint is shown in blue. (EPS)

(EPS)

The detailed and reproducible analysis reported in this paper. This PDF file is the output obtained by executing SupplementarylnfoRnw from 82 Supporting Information. In addition to all the figures and results in the manuscript it provides additional detail and information for those interested in understanding more about compositional data analysis and the analyses we have conducted.

R code and data to reproduce this paper’s analysis. This Zip file contains SupplementarylnfoRnw, the Sweave source Which is executed to analyse the contents of the ./ data folder and present the results in 81 Supporting Information. (ZIP)

Author Contributions

Performed the experiments: IB SM. Analyzed the data: DL HE VPG. Wrote the paper: DL VPG HE SM IB. Developed the data analysis method: DL VPG HE.

Topics

mRNA

Appears in 17 sentences as: mRNA (19)
In Proportionality: A Valid Alternative to Correlation for Relative Data
  1. Data on absolute mRNA abundance
    Page 2, “Results”
  2. [6] on the absolute levels of gene expression (i.e., mRNA copies per cell) in fission yeast after cells were deprived of a key nutrient (Fig.
    Page 2, “Results”
  3. Unlike many experiments where researchers ensure (or assume) cells produce similar amounts of mRNA across conditions [7] , this experiment ensured cells produced very different amounts so as to illustrate the merits of absolute quantification (81 F ig.
    Page 2, “Results”
  4. .the commonly believed, though rarely stated, assumption that the absolute amount of total mRNA in each cell is similar across different cell types or experimental perturbations” [7].
    Page 3, “Results”
  5. When total absolute abundance of mRNA stays constant, fold changes in both absolute and relative abundance of each mRNA are equal.
    Page 3, “Results”
  6. When total absolute abundance varies, fold changes in absolute and relative abundances of each mRNA are no longer equal and can change in difierent directions.
    Page 3, “Results”
  7. We stress in the absence of any other information or assumptions to highlight the common assumption of constant absolute abundance of total mRNA across all experimental conditions.
    Page 3, “Correlations between relative abundances tell us absolutely nothing”
  8. If this assumption holds, and all the mRNAs comprising that total are considered, the relative abundance of each kind of mRNA will be proportional to its absolute abundance, and analyses of correlation or “differential expression” of the relative values will have clear interpretations.
    Page 4, “Correlations between relative abundances tell us absolutely nothing”
  9. Consider the red mRNA pair in Fig.
    Page 4, “Correlations between relative abundances tell us absolutely nothing”
  10. 3(a)); correlation between the blue mRNA pair in Fig.
    Page 4, “Correlations between relative abundances tell us absolutely nothing”
  11. We calculated gb for the relative abundances of all pairs of mRNAs and compared it to the correlations between their absolute abundances (S4 Fig): clearly, the absolute abundances of most mRNA pairs are strongly positively correlated; far fewer are also strongly proportional.
    Page 6, “Proportionality is meaningful for relative data”

See all papers in March 2015 that mention mRNA.

See all papers in PLOS Comp. Biol. that mention mRNA.

Back to top.

gene expression

Appears in 7 sentences as: gene expression (7)
In Proportionality: A Valid Alternative to Correlation for Relative Data
  1. Using yeast gene expression data we show how correlation can be misleading and present proportionality as a valid alternative for relative data.
    Page 1, “Abstract”
  2. Using timecourse yeast gene expression data, we show how correlation of relative abundances can lead to conclusions opposite to those drawn from absolute abundances, and that its value changes when different components are included in the analysis.
    Page 1, “Author Summary”
  3. To further illustrate how correlation can be misleading we applied it to absolute and relative gene expression data in fission yeast cells deprived of a key nutrient [6].
    Page 2, “Introduction”
  4. [6] on the absolute levels of gene expression (i.e., mRNA copies per cell) in fission yeast after cells were deprived of a key nutrient (Fig.
    Page 2, “Results”
  5. The revisitation of this assumption [7] should raise alarm bells about the inferences drawn from many gene expression studies.
    Page 4, “Correlations between relative abundances tell us absolutely nothing”
  6. As far as we are aware, none of the database providers explicitly address whether absolute levels of gene expression were constant across experimental conditions.
    Page 7, “Caution about correlation”
  7. These findings raise intriguing questions as to the molecular mechanisms underlying this proportional regulation, suggesting sophisticated, coordinated control of numerous mRNAs at both transcriptional and post-transcriptional levels of gene expression .
    Page 7, “Results in relation to genome regulation in fission yeast”

See all papers in March 2015 that mention gene expression.

See all papers in PLOS Comp. Biol. that mention gene expression.

Back to top.

correlation coefficient

Appears in 5 sentences as: correlation coefficient (5) correlation coefficients (1)
In Proportionality: A Valid Alternative to Correlation for Relative Data
  1. Currently, there are many gene co-expression databases available that provide correlation coefficients for the relative expression levels of different genes, generally from multiple experiments with different experimental conditions (see e.g., [18]).
    Page 7, “Caution about correlation”
  2. While goodness-of—fit measures for regression may not generally be appropriate for assessing proportionality, Zheng [28] eXplores the concordance correlation coefficient pC [29] which could be modified to provide an alternative measure of proportionality defined as and related to var(log(x/y)) by the terms in Equation 1.
    Page 9, “Alternative measures of proportionality”
  3. A 2D histogram of the correlation coefficient observed for the relative abundances of a given pair of mRNAs in a sample where the ten most abundant mRNAs have been removed, against the correlation coefficient observed for the relative abundances of that same pair, over all pairs.
    Page 10, “Supporting Information”
  4. While the distribution of the correlation coefficient pairs lies more on the diagonal than in the preceding figure, it is clear that correlation of relative abundances is sensitive to What is in (or out of) the
    Page 10, “Supporting Information”
  5. A 2D histogram of ¢(clr(xi), clr(xj)) for the relative abundances of a given pair (i, j) of mRNAs, against the correlation coefficient observed for the absolute abundances of that same pair, over all pairs.
    Page 10, “Supporting Information”

See all papers in March 2015 that mention correlation coefficient.

See all papers in PLOS Comp. Biol. that mention correlation coefficient.

Back to top.

“differential expression”

Appears in 5 sentences as: differential expression (2) “differential expression” (3)
In Proportionality: A Valid Alternative to Correlation for Relative Data
  1. With such relative—or compositional—data, differential expression needs careful interpretation, and correlation—a statistical workhorse for analyzing pain/vise relationships—is an inappropriate measure of association.
    Page 1, “Abstract”
  2. Challenges in interpreting “differential expression”
    Page 3, “Results”
  3. Tests for differential expression are popular for analyzing relative data in bioscience.
    Page 3, “Results”
  4. While “differential expression” of relative abundances is challenging to interpret, in the absence of any other information or assumptions, correlation of relative abundances is just wrong.
    Page 3, “Correlations between relative abundances tell us absolutely nothing”
  5. If this assumption holds, and all the mRNAs comprising that total are considered, the relative abundance of each kind of mRNA will be proportional to its absolute abundance, and analyses of correlation or “differential expression” of the relative values will have clear interpretations.
    Page 4, “Correlations between relative abundances tell us absolutely nothing”

See all papers in March 2015 that mention “differential expression”.

See all papers in PLOS Comp. Biol. that mention “differential expression”.

Back to top.

experimental conditions

Appears in 4 sentences as: experimental conditions (4)
In Proportionality: A Valid Alternative to Correlation for Relative Data
  1. We stress in the absence of any other information or assumptions to highlight the common assumption of constant absolute abundance of total mRNA across all experimental conditions .
    Page 3, “Correlations between relative abundances tell us absolutely nothing”
  2. If relative abundances x and y are proportional across experimental conditions 1', their absolute abundances must be in
    Page 5, “Proportionality is meaningful for relative data”
  3. Currently, there are many gene co-expression databases available that provide correlation coefficients for the relative expression levels of different genes, generally from multiple experiments with different experimental conditions (see e.g., [18]).
    Page 7, “Caution about correlation”
  4. As far as we are aware, none of the database providers explicitly address whether absolute levels of gene expression were constant across experimental conditions .
    Page 7, “Caution about correlation”

See all papers in March 2015 that mention experimental conditions.

See all papers in PLOS Comp. Biol. that mention experimental conditions.

Back to top.

mutual information

Appears in 4 sentences as: mutual information (4)
In Proportionality: A Valid Alternative to Correlation for Relative Data
  1. Thus, relative data is also problematic for mutual information and other distributional measures of association.
    Page 2, “Introduction”
  2. This many-to-one mapping means that other measures of statistical association (e.g., rank correlations or mutual information ) will not tell us anything either when applied to purely relative data.
    Page 4, “Correlations between relative abundances tell us absolutely nothing”
  3. This concern extends to methods based on mutual information (e.g., relevance networks [17]) since, as Fig.
    Page 7, “Caution about correlation”
  4. 1 shows, the bivariate joint distribution of relative abundances (from which mutual information is estimated) can be quite different from the bivariate joint distribution of the absolute abundances that gave rise to them.
    Page 7, “Caution about correlation”

See all papers in March 2015 that mention mutual information.

See all papers in PLOS Comp. Biol. that mention mutual information.

Back to top.

positively correlated

Appears in 4 sentences as: positive correlation (1) positively correlated (3)
In Proportionality: A Valid Alternative to Correlation for Relative Data
  1. 2: while their absolute abundances over time are strongly positively correlated , if someone (inappropriately) used correlation to measure the association between the relative abundances of these two mRNAs they would form the opposite view (Fig.
    Page 4, “Correlations between relative abundances tell us absolutely nothing”
  2. We calculated gb for the relative abundances of all pairs of mRNAs and compared it to the correlations between their absolute abundances (S4 Fig): clearly, the absolute abundances of most mRNA pairs are strongly positively correlated ; far fewer are also strongly proportional.
    Page 6, “Proportionality is meaningful for relative data”
  3. The few mRNA pairs that are strongly proportional (within the red rectangle) are also strongly positively correlated .
    Page 10, “Supporting Information”
  4. However, the converse is not true: strong positive correlation between mRNAs does not imply that they are strongly proportional.
    Page 10, “Supporting Information”

See all papers in March 2015 that mention positively correlated.

See all papers in PLOS Comp. Biol. that mention positively correlated.

Back to top.

expression data

Appears in 3 sentences as: expression data (3)
In Proportionality: A Valid Alternative to Correlation for Relative Data
  1. Using yeast gene expression data we show how correlation can be misleading and present proportionality as a valid alternative for relative data.
    Page 1, “Abstract”
  2. Using timecourse yeast gene expression data , we show how correlation of relative abundances can lead to conclusions opposite to those drawn from absolute abundances, and that its value changes when different components are included in the analysis.
    Page 1, “Author Summary”
  3. To further illustrate how correlation can be misleading we applied it to absolute and relative gene expression data in fission yeast cells deprived of a key nutrient [6].
    Page 2, “Introduction”

See all papers in March 2015 that mention expression data.

See all papers in PLOS Comp. Biol. that mention expression data.

Back to top.

fold changes

Appears in 3 sentences as: fold change (1) fold changes (2)
In Proportionality: A Valid Alternative to Correlation for Relative Data
  1. The relationship between the relative and absolute abundance of a component can be understood in terms of fold change over time.
    Page 3, “Results”
  2. When total absolute abundance of mRNA stays constant, fold changes in both absolute and relative abundance of each mRNA are equal.
    Page 3, “Results”
  3. When total absolute abundance varies, fold changes in absolute and relative abundances of each mRNA are no longer equal and can change in difierent directions.
    Page 3, “Results”

See all papers in March 2015 that mention fold changes.

See all papers in PLOS Comp. Biol. that mention fold changes.

Back to top.