SciSurf: Index of 'Stepwise Catalytic Mechanism via Short-Lived Intermediate Inferred from Combined QM/MM MERP and PES Calculations on Retaining Glycosyltransferase ppGalNAcT2'

Stepwise Catalytic Mechanism via Short-Lived Intermediate Inferred from Combined QM/MM MERP and PES Calculations on Retaining Glycosyltransferase ppGalNAcT2

Tomáš Trnka, Stanislav Kozmon, Igor Tvaroška, Jaroslav Koča

Published in PLOS Comp. Biol., April 2015

Abstract

To understand the process of protein glyco-sylation, the reaction mechanisms of the participating enzymes need to be known. However, the reaction mechanism of retaining glycosyltransferases has not yet been sufficiently explained. Here we investigated the catalytic mechanism of human isoform 2 of the retaining glycosyltransferase polypeptide UDP-GalNAc transferase by coupling two different QM/MM-based approaches, namely a potential energy surface scan in two distance difference dimensions and a minimum energy reaction path optimisation using the Nudged Elastic Band method. Potential energy scan studies often suffer from inadequate sampling of reactive processes due to a predefined scan coordinate system. At the same time, path optimisation methods enable the sampling of a virtually unlimited number of dimensions, but their results cannot be unambiguously interpreted without knowledge of the potential energy surface. By combining these methods, we have been able to eliminate the most significant sources of potential errors inherent to each of these approaches. The structural model is based on the crystal structure of human isoform 2. In the QM/MM method, the OM region consists of 275 atoms, the remaining 5776 atoms were in the MM region. We found that ppGaINAcT2 catalyzes a same-face nucleophilic substitution with internal return (SNi). The optimized transition state for the reaction is 13.8 kcal/mol higher in energy than the reactant while the energy of the product complex is 6.7 kcal/mol lower. During the process of nucleophilic attack, a proton is synchronously transferred to the leaving phosphate. The presence of a short-lived metastable oxocarbenium intermediate is likely, as indicated by the reaction energy profiles obtained using high-level density functionals.

Author Summary

These glycans are complex branched molecules assembled from monosaccharide units by a sophisticated cascade of enzymes from the group of glycosyltransferases. Disruptions in the synthesis of glycans are linked to various diseases with the most prominent example being cancer. To understand or control the process of glycosylation, the reaction mechanisms of the participating enzymes need to be known. Here we investigate the catalytic mechanism of human glycosyltransferase ppGalNAcT2 using the tools of computational chemistry. By modelling the crucial parts of the enzyme using a quantum mechanics-based description, we are able to trace the whole reaction path leading from the reactant state to the product state. Our results provide a reliable description of the motion of all important atoms during the reaction and they are fully consistent with available experimental data. The insights obtained in this study can be further used to design a potent inhibitor molecule, usable as a potential drug for diseases involving increased activity of the enzyme.

Introduction

[1—3] Glycans exist in a vast array of diverse structures built up from just a few small basic fragments. This can therefore be directly compared to the protein world, constructed purely from simple amino acids. However, in striking contrast to proteins, the structures of glycans are not encoded in any specific form analogous to the genome. [1] The so-called glycocode is just implicitly present in the regulation of hundreds of different highly specialized enzymes, glycosidases and glycosyltransferases, forming the glycosylation cascade. For this reason, understanding the reactivity of glycosyltransferases is essential to being able to decode the glycocode.

The reaction mechanism of inverting glycosyltransferases is well understood and both experiments and molecular modeling support a direct displacement SNZ-like mechanism with a protein amino acid functioning as a catalytic base. However, the same level of understanding has not yet been reached for members of the retaining group. A lot of scientific attention has been recently focused on this issue in an attempt to determine the reaction mechanism of retaining glycosyltransferases, with mixed results. [4, 5]

The first of them is the double-displacement mechanism, where the reaction is thought to proceed via two consecutive configuration-inverting nucleophilic substitutions, first forming a covalent enzyme-carbohydrate intermediate and then transferring the carbohydrate onto the acceptor molecule. In this mechanism, a suitably positioned amino acid residue functioning as the catalytic base is required and two enzymes, namely a-1,3-galac-tosyltransferase [6] (a3GalT) and blood-group A and B 0:—1,3-galactosyltransferase [7] were proposed to proceed with this mechanism. Theoretical calculations on truncated QM models [8] predicted this mechanism to be energetically possible. Later QM/MM calculations [9—1 1] also supported this mechanism.

Therefore, the other possible reaction mechanism, the “internal return-like”, also called the SNi-like mechanism, has been suggested for these enzymes. This mechanism does not require a nucleophilic residue to be present. In this case, the reaction can proceed either as a concerted mechanism via an oxo-carbenium ion pair transition state, or as a stepwise mechanism via a metastable intermediate that is subsequently captured by the acceptor nucleophile. [5] Compared with the double displacement mechanism, SNi substitution also seems to match the available kinetic isotope effect data. [12] The SNi-like mechanism was proposed for lipopolysaccharide a-1,4-galactosyltrans-ferase C (LgtC) [13] and supported by theoretical studies. [4, 14] Recent experimental evidence for the retaining glycosyltransferase, trehalose-6-phosphate synthase (OtsA) [12, 15] is consistent with the SNi mechanism and also supports the theory that the hydrogen bond between the phosphate group and the acceptor hydroxyl plays a role in stabilizing the transition state suggested by calculations. [14] However, the existence of a short-lived intermediate remains an open question.

All three studies used a hybrid QM/MM model of the entire enzyme and came to the same general conclusion that the SNi-like mechanism is the most probable one. Unfortunately, due to the substantial approximations used in these studies, many unanswered questions about the validity of their results remain.

[10] used a static approach of QM/MM geometry opti-misations constrained to points along a single predefined reaction coordinate, describing the difference in the lengths of the dissociating and newly forming CO bond. Obviously, this completely neglects the second transfer process taking place at the same time—the transfer of a proton from the acceptor hydroxyl moiety onto a base represented by the leaving phosphate group. This, together with the low resolution of the scanned coordinate, led to a sudden jump of the proton upon crossing the main reaction barrier, indicated by a sharp spike in potential energy. In the end, the resulting energy profile does not describe a minimum energy path on a single potential energy surface, but a combination of two unconnected path segments corresponding to the endpoint locations of the proton.

published another study [17] very similar to the LgtC one, focused on the ppGalNAcT2 glycosyltransferase. Although the conclusions presented there are again in agreement with theoretical expectations and available experimental findings, the ppGalNAcT2 study shares many of the methodological problems of the LgtC one. The authors have scanned potential energy along a single predefined reaction coordinate, using a very modest quantum-chemical description of the active site, namely the Becke-Perdew pure density functional together with a small basis (SVP) and a small quantum region (80 atoms). Just the basis set itself casts serious doubts on the usability of their results, as the authors themselves show that the related error in the potential energy barrier is at least 5 kcal mol‘1 (compared to TZVP basis), that is, about one third of the estimated barrier height. The influence of the simple density functional additionally seems to be of roughly the same magnitude. This can be related to the overall negative charge of the used QM region, as anionic systems are notoriously difficult to describe using pure density functionals due to a large self-interaction error. However, the most important shortcoming of the study in question is the fact that the authors were unable to find the transition state of the reaction, precluding any validation of the proposed reaction path. In contrast, the study on OtsA by Ardevol and Rovira [16] took a more rigorous approach, sampling both the nucleophilic substitution and proton transfer processes by means of two independent collective variables (CV). Their results are based on QM/MM Car-Parrinello molecular dynamics, using the metadynamics method to improve CV sampling and calculate free energy profiles, and support a single displacement with a two-step mechanism. [16] However, enhanced sampling methods like metadynamics provide correct free energy data only after the system reaches the regime of free diffusion along the reaction path. Unfortunately, computational resource constraints limit the achievable simulation lenght so severely that the free diffusion is essentially never reached. This leads to extremely noisy energy profiles, making unambiguous interpretation of the results obtained very difficult and their agreement with the eXpected reaction mechanism largely coincidental.

Multidimensional energy scans are able to provide an overall view of the potential energy surface (PES), but often suffer from unsampled degrees of freedom leading to discontinuities that can pass undetected. On the other hand, minimum energy reaction path (MERP) optimization methods enable the sampling of a virtually unlimited number of dimensions, and thus guarantee that a single contiguous path will be obtained. However, there is no indication whether a given minimum energy path is the most probable and physically sound one. It can thus easily happen that a given MERP is deemed to be correct, even though an alternative path with a lower barrier exists in a different region of the PES. Such a situation is obviously impossible to detect without global information about the shape of the PES. By applying both approaches together and cross-check-ing the results, possible errors and artifacts can be easily identified. If the optimised MERP path is geometrically and energetically consistent with the PBS, the possibility of discontinuities in the PES can be ruled out with confidence. At the same time, the shape of the PES can rule out the existence of alternative reaction pathways, validating the MERP. Unfortunately, although the idea of a combined approach is straightforward and its advantages are obvious, such a method is still not being ordinarily used to study enzymatic reactivity. Instead, studies based only on a single method with all its weaknesses are still very common.

This enzyme catalyses the first step in O-linked (mucin-type) protein glycosylation by transferring an N -acetylgalactosaminyl (GalNAc) group onto the serine or threonine hydroxyl moieties of an acceptor protein (Fig. 1). This glycosyltransferase exists in a large variety of isoforms eXhibiting different spatial and temporal eXpression patterns and substrate specificities. [19] Increased activity of ppGalNAcT2 has been linked to the metastatic ability of various types of carcinoma, suggesting that targeted inhibition of a certain isoform could open the way towards selective anticancer drugs. [20] Detailed knowledge of the reaction mechanism and especially the transition state structure could then be used to design a potent inhibitor. Using the combined approach outlined before, we were able to obtain a reliable description of the reaction mechanism of ppGalNAcTZ, including a fully optimised structure of the main transition state.

Results/Discussion

Reactant and product structures

In both cases, the protein consists of the main catalytic domain exhibiting the common GT-A fold and a C-terminal ricin-like lectin domain in a trefoil fold. [23] Both domains are connected by a flexible linker, and as such the lectin domain does not visibly interact with the catalytic domain. Because it is also experimentally known to not be essential for catalytic activity [24], the lectin domain was cut off at conserved [25] proline 435 and not included in further studies.

However, manganese usually occurs in complexes in a high-spin state possessing 5 unpaired electrons. [26] Because this fact would entail a spin-unrestrict-ed treatment of the active site, leading to an almost twofold increase in computational cost and possible convergence problems, we opted for replacing it with magnesium. Such a change has often been used in studies of similar enzymes to allow for spin-restricted calculations, based on tests by Kona and Tvaroska. [27] Although the general applicability of this replacement is uncertain, experimental data on the ppGalNAc transferase isoform 1 clearly show that 89% of its activity (measured as kcat using deglycosylated ovine submaxillary mucin as a poly-acceptor substrate) is retained when magnesium is used instead of manganese. [28] Computational results presented later in this work confirmed the applicability of this replacement.

Initial geometry optimisation of the model led to a dissociated carbocationic state. The reactant and product structures were subsequently obtained by pulling the anomeric carbon towards the respective oxygen atom using a restraint and then fully optimising the geometry after releasing the restraint. Potential energy scans

2) can be described by the parameters shown in Table 1. Based on this data, the initial 2D energy scan was done by scanning the C1-OA distance from 3.00 A to 1.50 A and the Ol-H distance from 1.80 A to 1.05 A, both in steps of 0.15 A. The resulting potential energy surface map is shown in Fig. 3. It shows a large discontinuity in the location of the apparent barrier, caused by a dissociation of the GalNAc-phosphate C1-01 bond (S3 and S4 Figs.). This implies that the calculated surface is, in fact, an artificial combination of two separate fragments of the respective 2D potential surfaces for the bound and dissociated state of UDP-GalNAc. The 2D PES region that would normally connect these two fragments is completely missing due to the inadequate sampling of the aforementioned bond dissociation process. This situation precludes any further utilisation of the results of this scan. Attempts to correct for this problem by running a three-dimensional scan (with the Cl-Ol glycosidic bond length added as a third coordinate) purely in the eXpected transition state region failed to locate a saddle point. This leads us to conclude that even the apparent position of the reaction barrier is incorrect because of the inadequate description of the reactive processes by the chosen scan coordinates. Unfortunately, running a three-di-mensional scan spanning the whole area from reactant to product is not feasible, due to the required number of scan points needed to achieve satisfactory resolution. For this reason we opted for a different set of two scan coordinates.

These are well known in the field of molecular dynamics, but they are not supported by common QM/MM software packages. After implementing them into the ADF program, we carried out another 2D energy scan, varying the nucleophilic substitution coordinate from 1.60 A to —1.80 A in steps of 0.20 A, and the proton transfer coordinate from 0.80 A to —0.30 A in steps of 0.10 A. This resulted in a smooth surface with no identifiable discontinuities, depicted in Fig. 4. However, several isolated data points exhibited energy significantly different from their neighbors, caused by the relatively loose geometry convergence criteria applied in order to keep the computational cost manageable. To create a clearly understandable visualisation with physically relevant contour lines, these points were removed prior to visuali-sation when their energy differed by more than 2 kcal mol‘jL from the average of four directly adjacent points (for interior points) or two adjacent points along the boundary (for points on surface boundary). In total, 12 such points were removed for visualisation, amounting to only 5% of the total point count (SS Fig).

The extent of the saddle region delimited by the 16 and 18 kcal mol‘1 contour lines is particularly noteworthy, as it is a clear sign of the relatively low curvature of the PES around the eXpected transition state. This low curvature makes direct identification of the TS candidates from the potential energy map difficult. We assume that this was the reason why attempts to optimize the transition state structure from only the scan results have been unsuccessful. It is also clear from the potential energy map that the proton transfer cannot serve as the initiating step of the reaction, because that would lead the system into the energetically unfavorable region in the top left corner of the map. Instead, the proton is spontaneously transferred during relaxation into the product minimum, as indicated by the low-energy region in the bottom left corner and the absence of a separate proton transfer barrier in the same region. The first phase of the reaction consists of the dissociation of the GalNAc-phosphate bond, corresponding to an increase in energy around d(C1-OA)—d(C1-01) = 1.00 A. No clear barrier can be identified for this process, as it merely appears to form a shoulder of the main reaction barrier.

Path optimisation

Projection of the initial and final paths into the distance-difference 2D map are shown in Fig. 4. It is apparent that the overall path shape did not change during path relaxation. The potential energy of the individual images is depicted in Fig. 4 by the color of the path points and is clearly in reasonable agreement with the surrounding potential surface. Both facts (the consistency of the path location and the image energies) provide important evidence that the results obtained using both methods are not influenced by errors stemming from incorrect description of the reaction by the selected 2D scan coordinates or an unphysical initial path approximation.

A single very large barrier is present, rising to a maximum relative energy of 14.1 kcal mol‘1 at image 20, followed by a steep yet smooth decline to the product minimum 6.7 kcal mol‘1 below the reactant energy.

14 kcal mol‘1 is in very good agreement with the phenomenological free energy barrier of approx 17 kcal mol_1, that can be calculated using transition state theory from the experimentally determined kcat value of 3.70 s_1. [21] Additionally, the SNi mechanism observed in our study is also supported by experimental kinetic isotope effect data. [12]

The first phase consists of a dissociation of the C1-01 glycosidic bond, covered by images 2—5. The distance of the attacking nucleophile does not change appreciably during this event, showing that the nucleophile is not directly involved in initiating it. On the other hand, the hydrogen bond between threonine hydrogen and phosphate oxygen shortens visibly by about 0.2 A, as this bond is made stronger by the increased negative charge on the oxygen atom after the heterolytic cleavage of the C1-01 bond.

The length of the (now dissociated) Cl-Ol bond increases as the phosphate leaVing group relaxes to a less strained position than the one at the start of the reaction. This gradual separation of the oxo-carbenium ion—leaVing group pair is probably the main reason for the gradual rise in energy, creating the nearly flat top of the barrier. At the same time, the nucleophile hydroxyl moiety is slowly inserted between the anomeric carbon and phosphate, as shown by the decreasing C1 -OA distance. After crossing the saddle point region at image 20, the energy starts to decrease and from image 23 to 25, key changes in bonding take place: o A new C1 -OA glycosidic bond forms between the acceptor and GalNAc, as indicated by the Cl-OA distance decreasing from 2.0 to 1.5 A. o The proton is transferred to the phosphate, While maintaining an exceptionally strong hydrogen bond to threonine With a bond length of only 1.34 A. o The phosphate moves back 0.3 A closer to the GalNAc, probably thanks to the decreasing repulsion With the nucleophile oxygen. In the last five images, the energy decreases as the system releases the conformational strain and the bond lengths relax to their equilibrium values.

Replacing the magnesium ion with the natural manganese ion in high-spin configuration together with a spin-unrestricted calculation does not alter the energies significantly and the shape of the energy profile is almost entirely retained (Fig. 5A). The differences are mainly noticeable in the region of the saddle point and product minimum, where the overall barrier height is lowered by 0.8 kcal mol‘1 to about 13.3 kcal mol_1. The sign and magnitude of this energy difference agrees well with the experimentally observed difference [28] in reactivity for magnesium and manganese, although considering the accuracy limits of the computational methods employed, this could be just a coincidental agreement. Similarly, single-point energies for the profile were recomputed with a larger basis set to check for possible basis incompleteness issues. The profile calculated using the QZ4P basis [29] is qualitatively unaltered, exhibiting only a slightly increased barrier height, by 1.8 kcal mol'l.

Although any density functional can exhibit its own share of problems, it is much less probable that a given artifact would be present in the data calculated using several different density functionals. Additionally, more complex density functionals are inherently more accurate because they are based on fewer approximations in various energy terms. For example, hybrid density functionals suffer much less from artificial locality and self-interaction error than GGA functionals, thanks to the use of the exact electron exchange term in hybrid functionals. Further improvement in accuracy is available by also including the kinetic energy density term, forming the so-called group of meta-hybrid density functionals that represent essentially the best QM methods usable for systems with hundreds of atoms. The results obtained using the hybrid PBEO functional [30, 31] agreed well with the original OPBE ones, but a shallow minimum corresponding to a metastable oxocarbenium intermediate was now visible. The same conclusion could be drawn from a very similar profile calculated by the state-of-the-art meta-hybrid MO6-2X density functional [32, 33]. This observation is similar to the findings of the OtsA study, Which predicted a single displacement but a two-step reaction. [16] On the other hand, the commonly used Becke-PerdeW GGA density functional [34, 35] completely failed to provide a physically sound energy profile (S7 Fig. ).

This is in contrast With the three confirmed stationary points that were successfully optimised using the much faster OPBE functional. For this reason, we selected image 6 as a representative structure of the first transition state and image 9 as the intermediate. Both structures are presented in Fig. 6.

Finally, it has to be noted that the results are based purely on potential energy data While the real physical process is controlled by free energy. The depth of the minimum may thus be significantly affected by the entropic effects included in free energy.

Transition state optimisation

After reaching convergence, the transition state depicted in Fig. 6 was obtained.

The final geometry of the second transition state thus has the same major features as NEB image 20. The proton is still attached to the acceptor oxygen, but at the same time it participates in a very strong hydrogen bond with the leaving group. The carbohydrate ring is in an envelope conformation with a partial half-chair character (S9 F ig. ), and a new C1 -OA glycosidic bond is being formed. Its length in the optimised transition state is 2.33 A, almost exactly matching the length of the dissociating Cl-Ol glycosidic bond in the estimated structure of the first transition state. This observation supports the previously proposed concept [5, 12] of the two transition states involving each glycosidic bond being very similar, almost “mirror images” of each other. To verify the correctness of the obtained transition state, a full QM/MM numerical Hessian was subsequently calculated. It contains exactly one negative eigenvalue in both the non-mass-weighted and mass-weighted (normal mode) coordinate systems, confirming that the structure corresponds to a first-order saddle point. The calculated imaginary frequency of this normal Vibration mode is 961' cm_1, in line With the previously observed low curvature of the PES. Visual inspection of the normal mode motion confirmed that it represents the nucleophilic substitution process.

Important interactions stabilizing the transition state

We have identified several interactions that probably play a key role in facilitating the reaction. These can be divided into three main groups: interactions with structural role (enforcing a proper relative positioning of the substrates), those stabilizing the positive charge on the Gal-NAc oxocarbenium ion and finally interactions stabilizing the negative charge on the diphos-phate moiety of the UDP leaving group. Among the structural interactions there are several hydrogen bonds coordinating the Gal-NAc moiety: a bond between the amidic backbone hydrogen of Gly309 and the N-acetyl carbonyl oxygen, two hydrogen bonds between Glu334 and the O4 and O6 hydrogens of GalNAc and a hydrogen bond between Arg208 and the O4 GalNAc oxygen. All of those interactions keep the GalNAc moiety rotated around the glycosidic bond towards the diphosphate group, exposing the glycosidic bond and C1 atom to a nucleophilic attack by the acceptor. The positive charge that develops on C1 after bond dissociation interacts in a charge-dipole manner with the carbonyl group of Ala307, a member of a loop covering the fi-face of GalNAc. Apart from the electrostatic effects of the metal cation and hydrogen bonding with the water molecules coordinated to this ion, the leaving group is additionally forming a strong hydrogen bond with Tyr367, a hydrogen bond with Arg362 and another, relatively weak hydrogen bond with the amidic hydrogen of the acceptor threonine. There is also an important intramolecular hydrogen bond between a phosphate oxygen and the amidic hydrogen of Gal-NAc that further contributes to keeping the saccharide moiety suitably rotated.

It plays a double role: Forms a CH-pi interaction with the C6 hydrogen atoms of GalNAc and at the same time donates a hydrogen bond to the phosphate oxygen participating in the original glycosidic bond. This hydrogen bond is quite weak in reactant (having a length of 2.69 A), but grows much stronger after dissociation of the glycosidic bond, achieving its minimum length of 1.76 A around the transition state and then getting weaker again after the nucleophilic capture (2.02 A in product). All of the enzyme residues participating in these interactions are highly conserved and were experimentally identified as being crucial for preserving reactivity (Table 2).

Conclusions

The optimized transition state for the reaction is 13.8 kcal mol‘jL higher in energy than the reactant, while the energy of the product complex is 6.7 kcal mol‘jL lower. This corresponds to a dissociated oxo-carbenium state just before its nucleophilic capture by the acceptor threonine oxygen. During the process of nucleophilic attack, a proton is synchronously transferred to the

leaving phosphate.

By coupling two different QM/MM-based approaches for investigating the reaction mechanism, namely a PBS scan in two distance-difference dimensions and a MERP optimisation using the NEB method, we were able to rule out the most significant sources of potential errors. We can therefore conclude that the observations based on theoretical modeling are in good Table 2. Parts of conserved enzyme residues included in GM region. Residue Included part Function agreement with available experimental evidence [12] , including the recent X-ray structures based on modified substrates. [36]

The barrier of this step is lower than 10 kcal mol‘jL and is thus hidden under the barrier of the rate-determining step. However, the presence of a short-lived metastable oxocarbenium ion is likely, because the corresponding energy minimum is visible in energy profiles obtained using higher-level density functionals. On the other hand, the minimum is only ca. 1 kcal mol‘1 deep, and such a subtle energy difference is at the limit of the accuracy provided by applicable theoretical modeling approaches. Additionally, the stability of the intermediate can be affected by other entropic and environmental phenomena not considered here. We have shown that distance-difference coordinates provide an exceedingly useful tool for the description of common reactive processes, and are certainly useful for the wider scientific community interested in mapping reaction potential energy surfaces. Additionally, the Nudged Elastic Band method for MERP optimisation is suitable for a rapid exploration of reaction pathways, and it is immune to the coordinate sampling problems common in static energy mapping. However, knowledge of the shape of the PES is necessary to ensure that a physically relevant path is selected for optimisation.

Methods

QM/MM model of ppGaINAcT2

The final RMSD was 0.88 A for C-alpha atoms and 1.33 A for all protein atoms. Visual inspection of the active site showed near perfect overlap of the UDP molecules and neighboring side chains, allowing the coordinates of GalNAc to be directly transferred from 2D7I into 2FFU.

First, the required fragment file for UDP-GalNAc was generated by the antechamber tool from AmberTools 1.4 [38] with partial atomic charges calculated by the AMI-BCC method [39] to give a total charge of —2. The protonation and oxidation states of relevant protein residues were assigned automatically by pdb2adf based on their chemical environment and visually checked for correctness. The protonation state of histidine 359 was manually overridden to HID as the automatically generated one (HIE) was obviously nonsensical. This residue is a ligand of the metal cofactor and therefore needs to have the Ne atom unprotonated. Furthermore, water molecules present in the active site were manually rotated to create a network of hydrogen bonds where possible.

The UDP donor was represented by methyl di-phosphate, as the ribose and uracil parts of the molecule are quite far from the reactive site. The acceptor threonine 7 of the EA2 peptide was included together with its direct neighbors, threonine 6 (excluding its amine group) and proline 8 (excluding its carboxyl group). Finally, 12 highly conserved residues interacting with the substrates were included (Table 2) as well as six water molecules that were well defined in the crystal structure (B-factors below 20). Three of these water molecules are located close to the metal ion with one of them directly serving as a ligand and the other two forming a hydrogen bond network between the first water molecule and neighboring active site residues. The other three water molecules are located approximately in the plane of the pyranose ring or slightly towards its beta-face (the face opposite to the glycosidic bond). These molecules form a hydrogen bond network, acting as hydrogen bond donors to the GalNAc 05 oxygen and the acceptor threonine oxygen. However, they are separated from the leaving diphosphate group by the carbohydrate and threonine moieties and thus cannot directly participate in the reaction as catalytic acids or bases or proton transfer mediators. The resulting system contains 252 real (non-capping) atoms in the QM region and 6051 atoms in total.

The NEWQMMM implementation of molecular mechanics in ADF was employed to manage the MM part of the system, described by the AMBER ff94 forcefield [42] combined with GLYCAM06 parameters [43] on the GalNAc group. The AddRemove QM/MM coupling scheme [44] was used and charges on the QM atoms were updated in every geometry iteration from the MDC decomposition [45] up to the dipole level. Hydrogen capping atoms were added by the AddRemove scheme to saturate link bonds crossing the QM/ MM boundary, bringing the overall QM atom count to 275.

This functional has been shown to be the best GGA functional for describing nucleo-philic substitution reactions. [49] In tests, it provided results qualitatively similar to the M06-2X meta-hybrid density functional [32, 33]. The description of weak interactions was augmented by the DFT-D3 empirical dispersion correction [50, 51] in “zero-damping” form. All calculations were carried out using the all-electron Slater-type TZP basis [29] with the charge fitting set distributed with ADP.

A Becke grid integration scheme [52, 53] was used for all calculations using ADF 2013.01, with resolution given by the “Normal” preset. A Voronoi cell based integration method [54] was used in the energy scans and M06-2X single point calculations, because the newer Becke scheme is not available in ADF 2012.01d. The number of integration points for this method was automatically determined by ADF to meet predefined accuracy level 4 for PBS scans or 6 for M06-2X calculations. To reduce random integration noise in the gradients and prevent it from spoiling the Hessian estimates, a smoothing method based on conservation of the Voronoi cells and integration points across the geometry steps was applied [53].

This was ensured by optimising the MM system by the scaled conjugate gradient method [55] until all MM gradient vector components decreased below 0.01 kcal mol‘jL A_1. SCF optimisation of the QM region was stopped when the maximum element of the commutator of the last two Fock matrices decreased below 10—5 au. Potential energy scans

In the first scan, two distance coordinates were used: d(C1-OA) and d(Ol-H). The second scan used two distance differences instead to provide a better description of the respective processes: d(C1-OA)—d(C1-Ol) describing the nucleophilic substitution and d(Ol-H)—(OA-H) describing the proton transfer. All scan coordinates were implemented using restraints. Support for distance difference restraints was implemented into ADF and subsequently added to the mainline distribution.

Afterwards, the second coordinate was scanned to the product value, generating the second scan dimension. The optimisation of each point started with the geometry, charges and MO coefficients of the preceding point in a given scanline and proceeded using a quasi-Newton optimizer with Broyden-Fletcher-Goldfarb-Shanno Hessian updates until the maximum gradient component decreased below 0.01 Hartree fifl.

Nudged elastic band optimisation

The algorithm was based on the ASE [58] Python toolkit coupled to ADF for MM optimisation and gradient evaluation. Cartesian coordinates of all 252 real QM atoms were used to describe the configuration of each NEB image, leading to an optimisation of the reaction path in the full 756-dimensional space without any a priori assumptions regarding the reaction coordinate. Just as in the case of PES scans, the positions of all MM atoms for each image were fully optimised before every gradient calculation. The path was discretized into 30 images, where the reactant and product endpoints were kept fixed and the rest was optimized simultaneously using the FIRE algorithm [59]. This algorithm was selected because quasi-Newton algorithms do not work well with NEB, both due to the high dimensionality (28 images X 756 coordinates each = 21,168-dimensional optimisa-tion space) and especially because the NEB Hessian matrix is not symmetric [60] and therefore can not describe the locally quadratic surface assumed by quasi-Newton algorithms. The FIRE algorithm is the most sophisticated algorithm implemented in ASE that is able to deal with this problem. All internal optimizer parameters were kept at their default values.

A partial perpendicular force term [56] was added to keep the path stable and smooth. The fraction of perpendicular force included was determined by where gb denotes the angle between adjacent path segments. Although this means that the result is no longer a rigorously correct minimum potential energy path, but just an approximation of it, the difference is minimal, and it is only visible in regions that are not particularly interesting in terms of the reaction mechanism. The force constant k for NEB springs was set to 5 eV A_1. Path optimisation was stopped when the maximum element of total NEB gradient decreased below 0.0025 Hartree A_1.

Structures were then generated by successive optimisation starting from the reactant structure with the two distance difference coordinates restrained to the values corresponding to a given point. All these structures were directly used as NEB images. Because the NEB approach outlined above forces the images to be equidistant with respect to their 756-D Euclidean distance, additional images were added by linear interpolation between the obtained structures until all path segments were shorter than 0.25 A. To keep the implementation simple, only the positions of QM atoms were interpolated; the positions of MM atoms and the QM charges were directly copied from the nearest parent structure. Transition state refinement

Full numerical Hessian from a preceding calculation on image 23 was used as the initial Hessian for the optimisation. After reaching convergence, a full numerical Hessian of the total QM/MM energy was calculated for the optimized structure using symmetric central two-point finite differentiation of gradients with a step of 0.01 A. Cremer-Pople conformational parameters [62] for the carbohydrate ring were calculated using the cp.py script by Hill and Reilly [63].

Supporting Information

81 Fig. Superposition of the crystal structures used to build the initial model of ppGalNAcT2. Isoform 2 (PDB ID 2FFU) is shown in gray and isoform 10 (PDB ID 2D7I) in green. The ricin-like lectin domains (left part) were not considered in this study. The overlapping crystal positions of the uridine diphosphate and the positions of the EA2 acceptor peptide (from 2FFU) and GalNAc (from 2D7I) are depicted in ball-and-stick representation.

(EPS)

Note especially the abrupt change of the Cl-Ol distance upon crossing the apparent enery barrier. The positions of the expected stationary points obtained by the path optimisation visibly fall into a region Where both distances shown here still have reactant-like values, confirming the unphysical nature of distance-based scan results.

A discontinuity is Visible in the region between both transition states. It is clear that the points obtained by scanning the two distances fall either into the reactant or product basin and don’t sample the barrier region at all. (EPS)

Twelve outlying points (crossed out) were not used for Visualisation of the total energy surface in order to obtain physically meaningful contour lines. (EPS)

(EPS)

Notice the shallow minimum in QM energy around image 9 (expected intermediate), and a corresponding maximum in MM energy in the same region. These two contributions cancel out (for the OPBE functional), leading to no minimum being present on the total energy profile. QM energy profile calculated using the Becke-PerdeW functional is included, illustrating that this method is unable to produce a physically meaningful description of the reaction. (EPS)

Changes between iterations 80 and 100 (final) are negligible, confirming that the

The third quadrant of the pseudorotational diagram constructed from Cremer-Pople ring puckering coordinates 81 Supporting Information. Complete structures of all (optimised or estimated) stationary points. TWO files are provided for each stationary point—a PDB file for the Whole system and a XYZ file with only the QM region. (ZIP)

Author Contributions

Performed the experiments: TT. Analyzed the data: TT SK IT IK. Wrote the paper: TT SK IT IK.