SciSurf: Index of 'Interplay between Constraints, Objectives, and Optimality for Genome-Scale Stoichiometric Models'

Interplay between Constraints, Objectives, and Optimality for Genome-Scale Stoichiometric Models

Timo R. Maarleveld, Meike T. Wortel, Brett G. Olivier, Bas Teusink, Frank J. Bruggeman

Published in PLOS Comp. Biol., April 2015

Abstract

The computation of all feasible metabolic routes with these models, given stoichiometric, thermodynamic, and steady-state constraints, provides important insights into the metabolic capacities of a cell. How the feasible metabolic routes emerge from the interplay between flux constraints, optimality objectives, and the entire metabolic network of a cell is, however, only partially understood. We show how optimal metabolic routes, resulting from flux balance analysis computations, arise out of elementary flux modes, constraints, and optimization objectives. We illustrate our findings with a genome-scale stoichiometric model of Escherichia coli metabolism. In the case of one flux constraint, all feasible optimal flux routes can be derived from elementary flux modes alone. We found up to 120 million of such optimal elementary flux modes. We introduce a new computational method to compute the corner points of the optimal solution space fast and efficiently. Optimal flux routes no longer depend exclusively on elementary flux modes when we impose additional constraints; new optimal metabolic routes arise out of combinations of elementary flux modes. The solution space of feasible metabolic routes shrinks enormously when additional objectives—eg. those related to pathway expression costs or pathway length—are introduced. In many cases, only a single metabolic route remains that is both feasible and optimal. This paper contributes to reaching a complete topological understanding of the metabolic capacity of organisms in terms of metabolic flux routes, one that is most natural to biochemists and biotechnologists studying and engineering metabolism.

Author Summary

Organisms depend on huge networks of molecular reactions for environmental sensing, information integration, gene expression, and metabolism. The discovery of general principles of network behavior is a major ambition of systems biology and of great interest to biotechnology and medicine. We present a computational tool that calculates all optimal states of metabolism in terms of pathways, which is arguably the most intuitive and preferred approach to characterize whole-cell metabolism. We show how the space of all feasible flux distributions can be compactly described in terms of a unique set of minimal and feasible pathways, given realistic stoichiometric, thermodynamic, and optimization-objective constraints. This description clarifies the interplay between flux constraints and optimization objectives. We eXplain why some fluxes are variable and cross-correlate within the solution space while others do not and how multi-objective optimization shrinks the solution space. We illustrate our findings with a toy metabolic model to eXplain the main insights and apply it to a genome-scale stoichiometric model of Escherichia coli metabolism.

Introduction

Genome-scale stoichiometric models of metabolism [1, 2] and the availability of annotated genome sequences have greatly accelerated metabolic research. The combined use of high-throughput metabolo-mics data, comprehensive protocols [3], and (automated) reconstruction tools [4] has resulted in an explosion in the number and size of genome-scale stoichiometric metabolic models [5, 6]. Constraint-based modeling has become an indispensable tool to deal with these large models, used in biotechnology [7, 8] and medicine [9, 10].

The accuracy of FBA predictions depends on the availability of realistic flux constraints, which can be derived from experimental data. Generally, there are insufficient flux constraints to obtain a single unique solution and a large space of optimal flux distributions results [14—16]. These alternative flux distributions give an impression of the robustness of a metabolic network [17] , but not every alternative is equally favorable for the organism. In some environments organisms are strongly selected for yield, almost regardless of the protein burden, while in other environments the protein burden has a significant impact. The solution space can be analyzed further with secondary objectives [18—22] , e.g. minimization of the number of active fluxes [23] or the sum of absolute fluxes [24], which have been used as proxies for maximization of the protein expression efficiency and minimization of the protein burden, respectively.

Several approaches were proposed to give insight into the geometry of the optimal solution space [14, 15, 25—28] , which is mathematically represented by a polyhedron [29]. Flux Variability Analysis (FVA) [14] and Flux Coupling Analysis (FCA) [25] provide valuable information on the boundaries of the solution space, but do not give understanding in terms of metabolic routes. Such an understanding would be extremely helpful, as most biologists intuitively think in terms of metabolic routes.

The recently developed method, CoPE-FBA (Comprehensive Polyhedron Enumeration FBA) [16], enumerates the vertices, the corner points of the optimal solution space. The number of Optimal Solution Space Characterization: Vertices Rays Linealities (optimal flux vector) (irreversible cycle) (reversible cycle)

Characterization of the optimal solution space of metabolic models. The optimal solution space can be characterized by three topological features: veriices (purple), rays (green), and linealities (blue). Typically, optimal solution spaces of microbial genome-scale models are characterized by many veriices and only a few linealities and rays. Linealities do not exist when reversible reaction are split. Vertices can be described by a fixed active part (red) which is identical for each vertex and a variable part (orange), a few CoPE-FBA subnetworks [16]. We refer to 81 Fig. for examples of rays we found in the E.coli iAF1260 genome-scale metabolic model.

1). This method provides the structural insights that FVA and FCA lack, and explains the typical combinatorial explosion of the vertices; the optimal solution space can easily have millions of vertices that arise from independent combinations of alternative flux routes through only a few, small segments of the metabolic network. However, CoPE-FBA suffers from computational difficulties, it is slow, and—perhaps more important—the provided solution does not yield all non-decom-posable flux routes in the optimum, limiting the use of CoPE-FBA. For instance, it cannot be used to assess the influence of secondary objectives on the solution space.

We uniquely characterize the optimal solution space by adjusting CoPE-FBA to split each reversible reaction into two irreversible reactions; this yields all non-decomposable flux routes in the optimum. We start by illustrating the differences between the CoPE-FBA outcomes of metabolic models with and without revers-ible-reaction splitting. Next, we explain the relationship between these vertices and elementary flux modes (EFMs) with an optimal substrate-product yield. Finally, we show that secondary objectives typically collapse the optimal solution space to a unique solution (a vertex) or to a small set of vertices, using the iAF126O genome-scale model of Escherichia coli metabolism. Enumerating all non-decomposable flux routes in the optimum requires a more efficient computational method, Which we also present in the Methods section of this work. This results in CoPE-FBA 2.0, our tool of choice for analyzing the optimal solution space in terms of network topology.

Results

Characterization of the optimal solution space: Illustration with a toy model

Our toy network consists of 18 metabolites and reactions where the source metabolite X and sink metabolite Y are considered boundary metabolites. All reactions, besides the reactions where ATP and ADP act as cofactors, are isomerization (uni-uni) reactions and reversible reactions are illustrated by two headed arrows.

To constrain the solution space we used one inequality constraint, I1 3 2. Throughout this work, we call this type of (inequality) constraint a restricting nonzero flux constraint. The resulting FBA is formulated as the linear program: where N] = 0 is the steady-state constraint with N as stoichiometric matrix and I as flux vector (or flux pathway). Simple metabolic models can be optimized by hand, but linear programming is required for the solution of any realistic genome-scale model. FBA optimization confirmed that, for this set of capacity constraints, maximization of our objective function gives I18 2 1.

Several flux pathways maximize the objective I18 2 1, i.e. our FBA model is underde-termined. We can describe the optimal solution space with the Minkowski sum (see Equation (4)) in terms of three mathematical objects: linealities, rays, and vertices (Fig. 1) [29, 30]. Each of these mathematical objects relate to a topological motif in a metabolic network.

Linealities will cease to exist when we split each reversible reaction into two irreversible reactions later. The set of reactions R2, R3, R4, i.e. {R2—R4}, form a lineality (Fig. 2A). The reactions can take any value, but there must be a net flux of two through {R2—R4} that converts A into B. Rays are irreversible (thermodynamically infeasible) cycles or input-output pathways. Our toy model does not contain a ray. If at least one of the reactions R2, R3 or R4 would have been irreversible, {R2—R4} would have been a ray (Fig. 2A).

Convex combinations of neighboring vertices form the facets—“edges”—of this polyhedron. Our toy model contains four vertices termed V1—V4 (Fig. 2B). We can use a combination of these vertices and cyclic networks to represent every optimal flux vector.

In the optimum, these subnetworks consist of reactions with correlated flux variability and have a fixed net input-output stoichiometry of reactants and products. The toy model contains two subnetworks (Fig. 2C). In the subnetwork consisting of {R6—R10}, the alternative flux distribution {R6—R8} is anti-correlated with alternative flux distribution {R9—R10} and the reactions within these sets are positively correlated with each other (S2 Fig). Both sets of reactions have an identical input-output relationship: D + ADP —> H + ATP. We can multiply the number of alternative flux distributions through each subnetwork to obtain the total number of vertices; both subnetworks of the toy model have two alternative flux distributions—the lower and the upper branch—which gives a total of four vertices (Fig. 2B).

The representation of the network in its current form—where reversible reactions are not split into two irreversible reactions—complicates the characterization of the optimal solution space; not all non-decom-posable optimal pathways are vertices and the set of vertices is not unique. This set of vertices corresponds to the minimal generating set of the optimal solution space, a well-known concept in EFM analysis [31]. For our toy model, we enumerated all (optimal-yield) EFMs to illustrate the difference between the set of vertices we enumerated with CoPE-FBA and the set of non-decomposable flux pathways in the optimum.

EFM analysis showed that our toy model has thirteen EFMs of which twelve are operational modes that produce Y with an optimal yield (see also S3 Fig. for a more detailed analysis). Thus, for only one third of the optimal-yield EFMs, there exists a corresponding vertex (S3B Fig). The remaining eight EFMs do not have corresponding vertices, because: (i) two non-decomposable optimal pathways are a convex combination of different vertices; e.g. 1/2 V1 (EFM1) + 1/2 V2 (EFM3) = EFM2 (Fig. 3A) and (ii) six more optimal pathways are a linear combination of vertices (or of the two convex combinations) and the lineality (with flux through {R3, R4} rather than {R2}). Because of these additional non-decomposable optimal pathways, we cannot find all the possible pathways that optimize a (secondary) objective directly from the vertex representation—we shall return to this later.

Our toy network was designed to not contain suboptimal operational modes. The nonoperational mode is the line-ality shown in Fig. 2A.

For instance, CoPE-FBA gave that {R2} was part of each vertex and {R3, R4} was only part of a lineality. We can reverse this situation by making {R2} part of the lineali-ty and {R3, R4} part of each vertex, hence this decomposition is not unique.

To solve the issues with uniqueness and completeness, we exploited an existing technique in the field of EFM analysis: Splitting reversible reactions into separate forward and backward reactions [31, 32]. In the split model, the vertices are instances of exactly the non-de-composable pathways in the optimal state. Since all reactions are irreversible after the splitting procedure, linealities do not exist anymore. The split toy model contains seven rays, because each split reversible reaction forms an additional ray and two more rays from {R2—R4} in forward and backward direction (see Fig. 4A). After splitting, rays no longer signify thermody-namically-infeasible irreversible cycles. Each forward and backward reaction together forms a new EFM, but the set of optimal-yield EFMs is identical before and after splitting (S3A and S3D Fig), which was also proven by Gagneur and Klamt [33].

This means that our toy model contains now twelve rather than four vertices. To explain the difference between the set of vertices with and without splitting, we first focus on {R2—R4}. With splitting, the model contains vertices with both {R2} and {R3, R4}, while without splitting vertices only contain either {R2} or {R3, R4}. The variability in reactions {R2—R4} causes the number of vertices to double (2 x 2 x 2 vs.

4B) in the split model. The second difference originates from the CoPE-FBA subnetwork described by {R12—R15}. The flux distribution through {R12, R15} (see also EFM2 Fig. 3A) cannot be obtained via a convex combination of alternative flux distributions and is, therefore, now also a vertex. This difference causes the number of vertices to further increase from eight to twelve (2x2x3 vs. 2x2x2).

After reversible-reaction splitting, the set of optimal-yield EFMs corresponds to the set of vertices if there is only a single nonzero restricting flux constraint (for proof see 81 Text). With more constraints this is not necessarily the case, because EFMs are based on stoichiometry and thermodynamics alone, while vertices also depend on flux constraints. We illustrate this by discussing examples of different types of flux constraints on the sets optimal-yield EFMs and vertices: setting a flux to zero, adding a restricting constraint, and adding a demanding constraint.

The set of optimal-yield EFMs still corresponds to the set of vertices (of course all pathways using the removed flux, e.g. oxygen uptake, are absent). As an example, removing R15 from our toy network would result in the same set of four optimal-yield EFMs and vertices.

With this constraint, I18 2 1 cannot be achieved with EFMs that include reaction R15 (e.g. EFM2 and EFM3 shown in Fig. 3A). The corresponding vertices are infeasible, because vertices are only defined in the optimal space (see also 83C and S3F Fig. for a more detailed analysis). The corner points of the new optimal solution space are now described by a different set of vertices, i.e. still feasible vertices and vertices that arose after adding the second constraint. Each newly introduced vertex arose from an infeasible vertex and a neighboring feasible vertex. An example of such a vertex is given in Fig. 3B, which corresponds to a convex combination of EFMl and EFM2 or EFMl and EFM3. In this example, the number of corner points of the optimal solution space decreased after adding the second flux constraint, but this is not a general outcome.

An optimal-yield EFM through the demand reaction is then added to each vertex. The number of vertices increases if multiple optimal-yield EFMs coexist through the flux demanding reaction (which will then form another CoPE-FBA subnetwork).

In this section, we demonstrate how secondary optimization simplifies after reversible-reaction splitting and that secondary optimization reduces the solution space to only one or a few vertices. As a secondary optimization objective, we used minimization of the number of active fluxes—hereafter pathway length PL; see Equation (8). Mathematically, we can write this secondary optimization as follows: in which we set the output flux of the toy model to its maximal value obtained with the previous FBA, shown in Equation (1). We used mixed-integer linear programming to select the flux pathway with the minimal PL from the optimal solution space for one (I1 3 2) and two (I1 3 2, I15 3 0.5) restricting flux constraints, which gave a minimal PL of 11 and 12, respectively. Next, we determined the PL for each optimal-yield EFM and vertex of the “non-split” and “split” model (see S4 Fig. for more details).

Therefore, for the optimization of secondary objectives in the non-split model, we cannot focus solely on the vertices. We have to take into account the whole optimal solution space—the Minkowski sum of vertices, rays, and linealities (for details see Methods)—which is cumbersome.

This makes them independent from the chosen growth medium and objective function. However, without splitting, linealities and/ or rays can influence secondary objectives when they share reactions with one or more vertices. In this case we can construct non-decomposable optimal flux pathways that are not vertices by taking, for instance, a linear combination of a vertex with a connected lineality. An example is the lineality described by {R2—R4} (Fig. 2A). Adding {R2, R3, -R4} to one of the four vertices gives rise to a new optimal flux pathway; one with more active reactions.

A reaction becomes inactive when it goes in different directions with the same flux in alternative flux distributions. When I1 3 2 is the only constraint, a conveX combination of alternative flux distributions in {R12—R15} (Fig. 2C) shortens PL by one reaction: Spe-cif1cally, R13 and R14 carry flux in both alternative flux distributions whereas in different directions. This analysis shows that the minimal PL is a conveX combination of vertices V3 and V4 (Fig. 2B). This optimal pathway becomes infeasible when we add the second flux constraint I15 3 0.5; then, verteX V4 minimizes PL.

The shortest optimal flux pathway is always a verteX (and corresponds to an optimal-yield EFM if it is restricted by only one nonzero flux constraint). Theoretically, multiple shortest vertices can coexist. The fact that only a verteX or several vertices optimize the secondary objective is a specific result for pathway length minimization (see also SS Fig.). For instance, the optimal solution space after minimization of the sum of absolute fluxes as secondary objective (Equation (10)) can consist of a line or a plane, besides (multiple) single point(s).

Reversible-reaction splitting has many advantages for the characterization of the optimal solution space. We first summarize those advantages before we set out to analyze a genome-scale stoichiometric model. Splitting of the reversible reactions leads to: 1. The vertices discovered with CoPE-FBA are all possible non-decomposable pathways in the optimal state. For the analyses of optimal flux pathways, we can, therefore, focus solely on vertices. These vertices can be compactly described by a set of subnetworks that describe all the variability in non-decomposable optimal flux pathways. 2. A unique characterization of the optimal solution space. 3. Secondary optimization yields an optimal solution space consisting of one or multiple vertices. 4. Rays no longer signify thermodynamically-infeasible irreversible cycles.

We identified three different mechanisms that contribute to the increase in vertices: (i) splitting can yield additional CoPE-FBA subnetworks that originate from rays or linealities with a input-output relationship different from zero. An example is the lineality given by {R02—R04} (Fig. 2A) that is a subnetwork with splitting (Fig. 4B). (ii) optimal flux pathways that are conveX combinations of vertices before splitting become vertices after splitting. We encountered such a case in the toy model where the convex combination of vertices V3 and V4 resulted in an additional verteX after splitting. (iii) rays or linealities connected to CoPE-FBA subnetworks give rise to additional vertices. Imagine for the toy network, for instance, a reversible reaction that converts metabolite F into metabolite E (the reverse of reaction R7). Before splitting, R7 and this newly introduced reaction form a ray. After splitting, an additional verteX exists through this newly introduced reaction. Enumeration of many more vertices requires more computational power, hence we developed a much more efficient method, CoPE-FBA 2.0, for enumeration of the optimal solution space of both toy and genome-scale models, which is described in detail in the Methods section. The enumeration requires now minutes to hours rather than days to weeks to complete.

A real life example: Escherichia coli growing on glucose

By modifying the oxygen uptake constraint, we constructed three different FBA models of iAF126O that depict aerobic, aerobic restricted, and anaerobic growth (for details see Methods). We set maximization of biomass production rate as the objective function. For these growth conditions, general CoPE-FBA results for both the model with and without reversible-reaction splitting are shown in Table 1. With splitting we found for each growth condition many more vertices (up to 120 X 106). Since we also used an ATP maintenance demand constraint of 8.39 mmol gDW‘1 11—1, our vertices are not instances of EFMs. Without this constraint, all vertices in both the aerobic and anaerobic growth condition are instances of their corresponding EFMs (not shown). We found only a few CoPE-FBA subnetworks which together completely reveal the variability in the vertices. Most reactions are inactive after optimizing the objective function (SI Table). In the remainder of this section, we use the results obtained from the model with splitting because this yielded all non-decomposable flux pathways in the optimum.

We studied the distributions of objective values of a secondary optimization over the vertices obtained in the first optimization. For each vertex, we determined the pathway length PL, pathway sum of absolute fluxes P], and pathway cost PC (see Methods). Similar to work done by Shlomi et al. [10] , our protein cost definition was solely based on enzyme-synthesis cost. For instance, we did not take the protein lifetimes into account. Ignoring protein lifetimes implies that P] and PC are closely related; PC is taking P] multiplied with a protein cost for each individual reaction. In Fig. 5, we thus only show the results for PL and PC. Initially, we intuitively expected many vertices with intermediate PL, P], and PC, and few with relatively low or high PL, P], and

In other words, we expected a Gaussian-shaped distribution for both PL, P], and PC. As expected, the PL was indeed Gaussian-shaped distributed for all tested growth conditions. This is illustrated by the dashed black lines in the top panel of Fig. 5 which correspond to a Gaussian distribution where we used the sample mean and standard deviation as input.

An accurate determination of pathway cost is a challenge and we hypothesized that a different cost function could show a different distribution. Therefore, we investigated four different definitions of protein cost: minimum, maximum, average, and equal (i.e. sum of absolute fluxes; P1). When multiple proteins were associated to a particular reaction via an OR rule, the minimum, maximum or average was taken (for more details see Methods). In all cases, we found a multimodal distribution. Nonetheless, we did find an effect of the cost function; taking the “minimal” cost function typically resulted in the largest difference between both clusters, while taking the “equal” cost function typically resulted in the smallest difference between both clusters (82 Table, S6 Fig). These results show that for explaining the multimodal distribution of vertex cost, the effect of fluxes was much more important than the effect of protein costs.

Hence, different subsets of all non-decomposable flux pathways in the optimum can be found. To demonstrate the possible differences, we enumerated the Ecoli iAF126O model including reversible reactions for the same conditions of which results are shown in S7 Fig. Comparison with Fig. 5 shows that during aerobic growth conditions only two rather than four clusters were found.

CoPE-FBA subnetwork analysis revealed the shape of the distributions of the length, sum of absolute fluxes, and cost of vertices. Several different CoPE-FBA subnetworks contributed to the total length difference and within the majority of these subnetworks we found alternative flux distributions with different lengths; the length distribution within the subnetworks were uniform or already Gaussian in shape. It is, therefore, not possible to reconstruct many vertices with a short or long vertex length, which explains the Gaussian-shaped distribution of vertex length.

Within these subnetworks, we found two distinct modes—a relatively cheap and a relatively expensive mode. While some of these subnet-works were relatively large, our results show that the main cost difference in this particular subnetwork originates from only a few metabolic reactions (S3 Table). As a consequence, we found many vertices with a relatively low PC and many vertices with a relatively high PC—a multimodal distribution of vertex cost. In the aerobic growth conditions, the cost difference mainly emerged from using different electron acceptors for the NADH dehydrogenase; cheap pathways used ubiquinone-8 and costly pathways used menaquinone-8 and/ or demethylmena-quinone-8. Interestingly, in aerobic growth conditions ubiquinone-8 is the major quinone in E. coli [35, 36]. In anaerobic growth conditions, the cost difference mainly emerged from exploiting a different strategy for the ATP-dependent conversion from PEP and F6P to DHAP, G3P, and PYR in main carbon metabolism (88 Fig).

For the E. coli iAF126O model with splitting, secondary optimization reduced the solution space to only one or a few vertices (Table 2). In case of PL-minimization, only vertices can be optimal solutions, since convex combinations increase the number of active reactions. Compared to Table 2. Secondary optimization can reduce the optimal solution space to a unique flux distribution.

When we minimize pathway cost (PC) a unique optimal flux distribution exists, but many suboptimal flux distributions do exist (Fig. 5).

This was expected because PL is solely based on the number of active reactions, specific flux values are not of interest. Taking these flux values into account typically results in more diverse outcomes. Hence, it is less likely to find as many vertices with a minimal P]. Similarly, adding different protein costs to each reactions further diversifies these outcomes. As a result, the optimal solution space for PC-minimization resulted in a unique flux distribution for all tested growth conditions.

Discussion

We further developed this method: Rather than enumerating the minimal generating set, we used reversible-reaction splitting [31, 32] to enumerate all non-decom-posable flux pathways in the optimum. This allows us to focus solely on the vertices for the analysis of optimal flux pathways.

Therefore, we also developed an efficient computational method, CoPE-FBA 2.0, for the (unique) characterization of the optimal solution space. We can now characterize the optimal solution space in the order of minutes for most (bacterial) genome-scale models on just an ordinary computer. CoPE-FBA 2.0 is efficient because it first determines the subnetworks and subsequently enumerates the vertices for each subnetwork (see Methods for more details). To illustrate this, the 120 - 106 vertices enumerated for E. coli under aerobic growth conditions originate from eight subnetworks with respectively 6, 3, 5184, 3, 2, 54, 2, 2 vertices. This means that while we determined in total only 5256 vertices (the sum), we actually enumerated 120.932.352 vertices (the multiplication) within 15 minutes on an ordinary computer.

We recall that the vertices correspond to optimal-yield EFMs if there is only a single restricting flux constraint. Both restricting and demanding flux constraints modify the (optimal) solution space. Typically, the optimization problem remains underdetermined and an optimal solution space will continue to exist. We can get a unique solution by adding additional constraints that concern all flux values in the model (e.g. protein cost constraints). Then, the optimal state is an instance of an optimal-yield EFM if there is only a single restricting flux constraint. Alternatively, the optimal state corresponds to a conveX combination of optimal-yield EFMs. For this reason, we can also use CoPE-FBA 2.0 to quickly enumerate all optimal-yield EFMs, which can be useful because enumerating the complete set of EFMs of a genome-scale model is a laborious undertaking [37, 38].

While these objectives have been used frequently to find more realistic FBA outcomes [18—22], we showed that for both minimization of PL and PC, optimal solution spaces continue to exist (Table 2). This result shows that we should be careful drawing conclusions from predicted flux distributions after using a secondary objective. Using CoPE-FBA with only irreversible reactions allows for a straightforward identification of the origin of the remaining solution space. Specifically, these solution spaces originate from identically favorable pathways through CoPE-FBA subnetworks. Similarly, we can use these CoPE-FBA subnetworks to directly explain the differences after secondary optimization, as we showed for the multimodal distributions of vertex cost.

We found Gaussian-shaped and multimodal distributions for PL and PC, respectively. These results can be further used to deduce a hypothesis of the selection pressure if we know the flux distribution. If the objective of E. coli would be to minimize PC (or P;), we would not expect E. coli to exploit the unique optimal solution since the difference between this optimal solution and many suboptimal solutions is almost negligible. We do, however, expect E. coli to exploit the “cheap” reactions that cause the bi or multimodal distribution of vertex cost. Interestingly, in aerobic conditions, our analysis predicted that all cheap pathways exploit ubi-quinone-8 which is also the major electron acceptor in E. coli under these conditions [35, 36]. If the objective of E. coli would be to minimize PL, many different flux vectors give rise to an optimal or near-optimal solution. The multitude of optimal solutions hinders the construction of a hypothesis about PL from individual reactions. In future research we see as possible application of our method, finding the minimal flux distance between alternative optimal flux vectors in different conditions, to answer questions about how species can adapt to changing conditions.

This paves the way to answer biological questions about the flexibility of organisms while growing at optimal states in a fast and straightforward manner. This work, therefore, contributes to reaching a topological understanding of metabolic functionality in the optimum in terms of metabolic flux pathways. In the future, the development of graphical maps [39] can further simplify the analysis by allowing for straightforward visualization and inspection of these metabolic flux pathways.

Methods

Flux balance analysis

The representation of a metabolic network in a stoichiometric model is particularly useful for constraint-based modeling techniques like Flux Balance Analysis (FBA). By using a linear programming approach, FBA can optimize (maximize or minimize) an objective function subject to the steady-state constraint, thermodynamic constraints, and capacity constraints: Here, 0 is a vector of coefficients that represent the contribution of each flux in vector I to the objective function Zobj. Next, N I = 0 is the steady-state constraint. Finally, Imi" and Im‘“ specify the minimal and maximal flux values for each reaction. In addition to providing a unique optimal outcome of the objective function, FBA provides a corresponding optimal flux distribution 101”. Most FBA models are underdetermined systems, and therefore, many corresponding 101’ t exist. For more details about FBA, we refer to Orth et a1. [12]. Minkowski sum The Minkowski sum given in Equation (4) provides the description of any 101” in terms of vertices, rays, and linealities [16, 29].

Additionally, s, t, and u represent the upper boundaries of the sum functions indicating the number of vertices, rays, and linealities, respectively. Furthermore, ak, fik, and yk represent the weighting coefficient that satisfy the following constraints: 22:1 06k = 1, ak 2 O, fik 2 O, and yk can take any value. In words, vertices can be summed by a conveX combination, rays can be summed as a conical combination, and linealities can be summed as a linear combination. This Minkowski sum alters (Equation (5)) once we split each reversible reaction into two irreversible reactions, because linealities do not exist anymore. Each split reversible reaction fulfills all conditions for a ray, thus many more rays are found When we split each reversible reaction. Each of these additional rays is also an EFM and an eXtreme pathway Which are considered irrelevant because they only reformulate reversibility [32, 40].

CoPE-FBA subnetworks and F-modules

The set of vertices yields a flux space Without futile cycles. CoPE-FBA subnetworks are defined Within this flux space and have a fixed input-output relationship, Which we can write mathematically as:

Subsequently, N A and I A are the stoichiometric matrix and the flux vector of the subnetwork, and d is the fixed input-output relationship of the subnetwork. We can also calculate subnetworks (modules) in a flux space

These subnetworks are called F-modules and can be determined Via FluXMo-dules [42]. We can distinguish two types of F-modules: F-modules not essential for optimality are rays or linealities not connected to optimal flux pathways. An F-module essential for optimality can be a CoPE-FBA subnetwork, a ray or line-ality connected to optimal flux pathways, or a combination of those.

CoPE-FBA 2.0 pipeline

2012 [16] developed the CoPE-FBA pipeline developed to characterize the optimal solution space in terms of vertices, rays, and linealities. Enumeration of genome-scale models Without reversible-reaction splitting can take already several days With this computational method. We developed a new pipeline, CoPE-FBA 2.0, to make this enumeration less memory and CPU intensive. First, we preprocessed the model as also described by Kelk et al. [16]. Then, we executed the following steps: 1. Determine F-modules and extract the fixed network. We used a Python implementation of FluxModules to quickly determine the F-modules.

Determine d for each F-module. To circumvent numerical issues we used rational FBA (QSopt_EX version 2.5.0 [43]) to determine d. FBA output was also used to set the values of the fixed network.

Reconstruct F-module models. For each F-module we reconstructed a model that consisted only of the reactions and metabolites of the F-module. We added input and output reactions to fix (1 of each F-module essential for optimality in the optimal solution space. Dummy species were added both the input and output reaction to guarantee use of both reactions.

Perform CoPE—FBA as described in Kelk et al. 2012 [16] for each F-module. Enumeration on each F-module essential for optimality yielded all vertices. F-modules not essential for optimality were enumerated to determine the total number of rays.

Reconstruct network vertices. We merged fixed parts and the enumerated vertices for all subnetworks.

2012 [16]. The CoPE-FBA 2.0 pipeline and all data files used during these study are available for download from http:// memesa-tools.sf.net. Rank test

Consequently, each enumerated subnetwork verteX should correspond to an optimal-yield EFM of the subnetwork. We successfully used the rank test [44] to show that each enumerated “subnetwork vertex” v is an instance of an (optimal-yield) EFM. First, we determined the zero indices of v. Second, from the stoichiometric matriX N of the subnetwork we eliminated all columns with a zero indeX in v to create a submatriX an. Third, we used single value decomposition to determine the rank of NM. Last, we used the rank—nullity theorem to determine its nullity (Equation (7)), the dimension of the right nullspace which should be one if v is an EFM.

In addition, even if all vertices of all subnetworks were instances of optimal-yield EFMs, vertices describing the optimal pathways through the complete network do not have to correspond to EFMs. This is only true if there is one restricting nonzero flux constraint and no demanding flux constraints.

Secondary objectives

As a secondary objective, we used, in addition to pathway length (PL) and pathway sum of absolute fluxes (PI), also pathway cost (PC)—a proxy for the minimization of the ATP utilization in protein synthesis—t0 reduce the size of the solution space.

This cost is the scaled length of the proteins that were associated to this reaction, which we used as a proxy for all costs. In other words, cj < 1 when the associated protein length is smaller than average and vice versa. We set cj = 1 when no information about associated proteins was available. Because multiple proteins can be associated to a particular reaction via AND and OR rules, different definitions of cj were used: maXimum, average, minimum, and equal. An AND rule corresponds to taking the sum of protein lengths, while an OR rule corresponds to taking the maXimum, average, or minimum. Using equal cost is identical to minimizing the sum of absolute fluxes, a widely-used secondary objective. Taking the maXimum, average, minimum, or equal definition of cj did not effect the interpretation of our results.

Genome-scale models

Maximum glucose uptake was set to 12.77 mmol gDW‘1 11—1 and we modified the bounds on the 02 uptake reaction to create specific aerobic (no constraint on 02 uptake) and anaerobic (exchange of 02 set to zero) conditions. In all cases, the model required an ATP maintenance flux of 8.39 mmol gDW‘1 11—1. The model was edited and prepared for enumeration using PySCeS CBMPy [46, 47]. All models are provided as Supplementary Dataset Sl. Optimization of secondary objectives (minimization of PL, P], and PC) was also done with PySCeS CBMPy. We used a mixed-integer linear program to minimize PL and a linear program to minimize P] and PC.

Supporting Information

81 Dataset. A zip-file containing SBML files encoding the metabolic networks used in this 81 Fig. Examples of rays in the E. coli iAF1260 genome-scale metabolic model. (PDF)

(PDF)

Proof that the polytope of a single-constraint FBA has vertices that lie on EFMs. (PDF)

Most reactions carry no flux in the optimal solution. (PDF)

Flux is the main contributor to the observed cost difference between vertices. (PDF)

Cost explaining reactions for the three different growth conditions of E.coli iAF1260. (PDF)

Acknowledgments

Arne Reimers for providing a Python implementation of FluXModules and SURFsara (WWW. surfsara.nl) for the support in using the Lisa Compute Cluster.