SciSurf: Index of 'Machine Learning Methods Enable Predictive Modeling of Antibody Feature:Function Relationships in RV144 Vaccinees'

Machine Learning Methods Enable Predictive Modeling of Antibody Feature:Function Relationships in RV144 Vaccinees

Ickwon Choi, Amy W. Chung, Todd J. Suscovich, Supachai Rerks-Ngarm, Punnee Pitisuttithum, Sorachai Nitayaphan, Jaranit Kaewkungwal, Robert J. O'Connell, Donald Francis, Merlin L. Robb, Nelson L. Michael, Jerome H. Kim, Galit Alter, Margaret E. Ackerman, Chris Bailey-Kellogg

Published in PLOS Comp. Biol., April 2015

Abstract

The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates.

Author Summary

Antibodies are one of the central mechanisms that the human immune system uses to eliminate infection: an antibody can recognize a pathogen or infected cell using its Fab region While recruiting additional immune cells through its PC that help destroy the offender. This mechanism may have been key to the reduced risk of infection observed among some of the vaccine recipients in the RV144 HIV vaccine trial. In order to gain insights into the properties of antibodies that support recruitment of effective functional responses, we developed and applied a machine learning-based framework to find and model associations among properties of antibodies and corresponding functional responses in a large set of data collected from RV144 vaccine recipients. We characterized specific important relationships between antibody properties and functional responses, and demonstrated that models trained to encapsulate relationships in some subjects were able to robustly predict the quality of the functional responses of other subjects. The ability to understand and build predictive models of these relationships is of general interest to studies of the antibody response to vaccination and infection, and may ultimately lead to the development of vaccines that will better steer the immune system to produce antibodies with beneficial activities.

Introduction

This correlation is often thought to be mechanistic, as in numerous disease settings passively transferred antibodies provide protection from infection [2]. Yet, the fact that some vaccines that induce an antibody response do not provide protection indicates that beyond presence and prevalence, there are specific antibody features associated with protection: that is, not all antibodies are created equal. Efforts to develop a protective HIV vaccine may represent the setting in which the discrepancy between the generation of a robust humoral immune response and generation of protective humoral immunity has been most apparent. That this might be a more general observation is suggested by recent dengue vaccine trials, where protection was seen but did not appear to correlate with the well-established virus neutralization assay [3,4].

Due to viral diversity, vaccine-specific antibodies may or may not recognize circulating viral strains [6]. Furthermore, beyond viral recognition, binding antibodies vary considerably in their ability to neutralize diverse viral variants (case studies in [7,8] and reviewed in [9] ), with most antibodies possessing weak and/ or narrow neutralization activity [10]. While generating broadly neutralizing antibodies represents a cornerstone of HIV vaccine efforts, as these antibodies clearly block infection in animal models [11], vaccines tested thus far have induced antibodies with only a limited ability to neutralize viral infectivity [12]. However, beyond this role in the direct blockade of viral entry, antibodies mediate a remarkable repertoire of protective activities through their ability to recruit the antiviral activity of innate immune effector cells. Yet, here as well, the ability of HIV-specific antibodies to act as molecular beacons to clear virus or virus-infected cells is also widely divergent [13].

While a number of structure:function relationships have been characterized in terms of virus recognition, neutralization, and innate immune recruiting capacity, our understanding of the relationship between antibody features and their protective functions remains incomplete. However, the recent development of high-throughput methods to assess properties of both antigen recognition and innate immune recognition [14] offers more fine-grained information about the antibody response, which could feed into the development of models to inform our understanding of antibody activity.

Importantly, within this trial, the correlates of reduced risk of infection were binding antibodies, and, in the absence of an IgA response, antibody function, in the form of natural killer (NK) cell-mediated antibody-dependent cellular cytotoxity [16]. Subsequent analysis has supported these findings: with evidence of the impact of variable domain-specific antibodies apparent in the sequences of breakthrough infections [17], and antibodies of the IgG3 subclass associated with reduced risk of infection [18]. Because the vaccine was partially efficacious, studying the diversity of antibody responses among volunteers has the potential to help identify novel immune correlates. Thus, this trial represents a compelling opportunity to profile antibody structure:function relationships from the standpoint of relevance to protection and an excellent setting in which to apply machine learning methods to characterize the relationship between antibody features and function in a population whose response to vaccination varied in a clinically relevant way.

These effector functions are mediated by the combined ability of an antibody’s Fab to interact with the antigen and its PC to interact with a set of FcR eXpressed on innate immune cells. Just as Fab variation impacts antigen recognition, Fc variation in IgG subclass dramatically influences FcR recognition, and antibody effector function is widely divergent among antibodies from different subject groups in ways that are not eXplained by titer, or the magnitude of the humoral response [22]. Therefore, we characterize the combination of antigen specificity and subclass in a multiplexed fashion (“antibody features”), and couple that characterization with assessments of effector activities from cell-based assays (“antibody functions”). This antibody feature and function data have previously been subjected to univariate correlation analysis, which identified associations between gp120-specif1c IgG3-subclass antibodies and coordinated functional responses in RV144 subjects. Conversely IgG2- and IgG4-subclass antibodies were associated with decreased activity, and subsequent depletion studies confirmed these discoveries [23].

While “predict” often connotes prospective evaluation, here, as is standard in statistical machine learning, it means only that models are trained with data for some subjects and are subsequently applied to other subjects in order to forecast unknown quantities from known quantities. In particular, we show that not only are antibody features correlated with effector functions, but that computational models trained on feature: function relationships for some subjects can make predictions regarding the functional activities of other subjects based on their antibody features. Using unsupervised methods we find patterns of relationships between antibody features and effector functions as well as among features themselves. Then, using classification methods we demonstrate via cross-validation that antibody features support robust qualitative predictions of high vs. low function, and using regression methods we likewise demonstrate that the features can enable quantitative predictions of functionality across multiple, divergent activities. The various methodologies are relatively consistent in both performance and identified features, giving confidence in the general procedure and the information content in the data. This objective approach to developing predictive models based on patterns of antibody features provides a powerful new way to uncover and utilize novel structure:function relationships.

Results

To model antibody feature-function relationships we analyzed samples from 100 subjects in the RV144 trial. A set of 3 different cell-based assays was conducted to characterize the functional activity of these samples, providing data regarding the effector function of antibodies induced by RV144 including: gp120-specif1c antibody dependent cellular phagocytosis (ADCP) by monocytes [24], antibody dependent cellular cytotoxicity (ADCC) by primary NK cells

We note that the subsequent analyses all use scaled and centered feature data, as the different features are on different and somewhat arbitrary scales according to bead set and detection reagent, and this standardization enables combination of the relative feature levels across these different scales. As a linear transformation, the standardization does not affect linear models, though the additional preprocessing truncation to 60 has an appropriate impact on outliers. The function data are only standardized for this visualization, as the assay values are meaningful for interpreting predictions.

We observe that the antibody features and functions are far from uniform. The relative functional responses differ by subject and by function, though a number of subjects eXhibit relatively strong or weak responses in multiple functions. Likewise, relative antibody feature strength differs by subject and feature, and notably some subjects eXhibit relatively strong responses across multiple antigen specif1cities for a given IgG subclass and/or strong responses across multiple subclasses for a given antigen specificity. Finally, there are relationships between the features and functions by subject, e.g., a group of subjects with strong ADCP and ADCC responses appear also to have strong feature characteristics. In order to better extract, assess, and utilize such observations, machine learning techniques were applied to provide models of the relationship between characteristics of HIV-specific antibodies induced by vaccination, and their functional activity.

Unsupervised learning

Consistent with their binding affinity to chR expressed on monocytes, IgG1 and IgG3 subclasses are most correlated with strong ADCP function, while IgG2 and IgG4 are less correlated or even mildly anticorrelated. Similarly, gp120 and V1V2 antigens tend to yield the strongest correlations, as would be expected given the direct experimental relevance of these antigens to this functional activity. For ADCC, the IgGl correlations are weaker and the IgG3 correlations weaker still, while the IgG2 and IgG4 classes are now slightly more correlated (particularly IgG2.gp41). For the cytokines, strong IgG1 and IgG3 correlations are observed, particularly with gp120 and V1V2. The IgG4 subclass also yields some strong correlations, likely influenced by the large number of subjects with undetectable IgG4 responses (uniform colors within a column in Fig 1, no longer 0 after standardization), and rare subjects with strong IgG4 responses.

Indeed, hierarchical clustering of the feature correlation profiles (Fig 2B) reveals that the features are not independent but in fact the true dimensionality of the data is lower than the number of original columns. The figure highlights siX clusters of mutually correlated features formed by bisecting the dendrogram as indicated to strike a balance between the number of clusters and their visual coherence. An array of statistical methods to determine an optimal number of clusters gave substantially different answers from each other, though the optimal partitions they identified were largely consistent how one might manually divide the dendrogram (results not shown). Some of these clusters are defined by Ab subclass (each IgG subclass dominates one cluster), while others are defined by antigen specif1city(V1V2 and p24 clusters are also observed). Correlations between IgG1 and IgG3-def1ned clusters are also observed. The combination of the feature:feature clustering and the feature:function correlations observed suggests that different groups of subjects produce characteristically different antibody responses, yielding different functional outcomes.

To support the supervised analysis below, a set of “filtered” feature sets was developed for each function. Filtered features were selected by choosing the feature most strongly correlated with the function within each cluster, in terms of the magnitude of the Pearson correlation coefficient (Fig 2A). Filtered features for each functional measurement are starred in Fig 2B, and span the full range of subclasses and antigen specif1cities. Thus, while redundancy is reduced, the ability to obtain insights into the relative contributions of each feature type to functional activities is maintained. While there are non-negligible correlations outside the clusters (and indeed between these selected features), the supervised results show that they have little impact on predictive performance. As an alternative method to account for the possible redundancy among antibody features, a principal component analysis (PCA) was also performed. PCA yields a set of principal components (PCs) that represent the main patterns of variability of the antibody features across subjects. The PCs provide a neW basis for the data; i.e., each observed feature profile is a weighted combination of the PC profiles, so we can think of the PCs as “eigen-antibodies”. In contrast to the filtered features, the principal components are composites, and by inspecting their composition, we can see the patterns of concerted variation of the underlying antibody features. Fig 2C illustrates the principal components and 81 Fig provides the corresponding eigenvalue spectrum (the relative amount of variance captured by each PC). While PC1 is essentially a constant offset by Which to scale the overall magnitude of a feature profile, the other leading PCs reflect many of the same relationships also observed in the clustering analysis, including both subclass relationships and antigen specificity relationships. In particular, PC2 largely contrasts IgG2/4 vs. 1/3 composition, PC3 IgG4 vs. others, and PC4 IgG3 vs. others, While PC5 focuses on the relative p24-associated contribution, PC6 that of V1V2, and PC7 apparently an even finer-grained V1V2 specificity. As these leading seven principal components are the most readily interpretable and cover a large fraction of the variance in the data (81 Fig), they are used for supervised learning below, and trailing PCs are dropped. The unsupervised analysis suggests that there is indeed a high level of information content in the data, evidenced by the relationships among features identified by the clustering and PCA approaches, the correlations between the antibody features and the functions, and the agreement of these relationships With biological intuition. The strong relationships uncovered by these methods suggest that it might be possible to build models to predict functions from features, Whether directly measured features or derived composites.

Supervised learning: Classification

To assess how much this discrimination depends on the classification approach utilized rather than the underlying information content in the data, we employed three different representative classification techniques: penalized logistic regression (a regularized generalized linear model based on Lasso), regularized random forest (a tree-based model), and support vector machine (a kernel-based model). Furthermore, in order to assess the effect of reducing redundancy and focusing on the most interpretable feature contributions, three different sets of input features were considered: the complete set (20 features: 4 subclasses * 5 antigens), the filtered set with one feature selected from each cluster based on correlation with function (6 features), and the PC features (7 leading PCs), as illustrated in Fig 2. Separate classifiers were built for each function and each input feature set.

To assess the overall performance, we conducted 200 replicates of fivefold cross-validation. That is, for each of 200 replicates, the subjects were randomly partitioned into five equal-size sets, or “folds”, and five different models were constructed. Each model was trained using data for four of the sets of subjects, and then was used to make predictions for the fifth “held-out” set. The predictions for the held-out subjects were compared against the known (but ignored for training) values, and performance assessed accordingly. By repeating this 200 times, the impact of the random split can be factored out.

This data poses a difficult classification problem as there is not a clear distinction between high and low classes, which were simply defined by the median value. Nonetheless, even with a rigorous 200-replicate fivefold cross-validation, a mean AUC of 0.83 (standard deviation of 0.10) was observed, indicating that antibody features are highly and robustly predictive of high vs. low ADCP activity. Fig 3G shows the contributions of the antibody subclass-specificity features to a classifier trained on the Whole dataset; While the coefficient values varied in individual folds, the same overall trends were observed over the different splits (results not shown).

The model sums the feature values, each weighted by its specific coefficient, and then applies a logistic function to yield the predicted classification value. In order to counteract overf1tting, the training process imposes a penalty relative to feature coefficients and thereby seeks a sparse model. The coefficients give the relative importance of each feature to the predictor; associated p-values indicate the confidence in those coefficient values (a large p-value indicates an unreliable estimate of the feature contribution). Thus we see, for example, that the two dominant and statistically significant (at an unadjusted 0.05 level) contributors to predicting ADCP class are IgG1.gp120 and IgG3.p24, capturing both key subclasses with two different antigen specificities. While not achieving statistically significant confidence in the coefficient value, negative contributions from IgG2 were also observed, consistent with the unsupervised analysis and the reduced ability of this subclass to bind to FcyR on phagocytes presumably due to blocking (i.e., preferred binding of antibodies with better affinity). No systematic pattern was observed among the misclassified samples; they varied over the 200 splits and were distributed over the whole range of ADCP values. They did, however, tend to be those subjects with the weakest overall feature profiles, without large contributions from features with either positive or negative coefficients.

To obtain a sparser and less redundant model, we trained classifiers using the filtered features from Fig 2B. Despite the reduction in data considered, Fig 3C and 3D shows that the resulting performance with the filtered feature set is comparable to that with the complete feature set, with a mean AUC of 0.84 (standard deviation 0.10). The feature contributions in Fig 3H are still driven by positive contributions of IgG1 and IgG3 with some of the same antigens, along with negative IgG2 (with gp140).

Thus we assessed each possible combination of features taken from the siX clusters in Fig 2B. We found that on average an AUC of 0.79 was obtained, with a range from 0.67 to 0.87 and a standard deviation of 0.04 (recall that the PCC-based approach obtained 0.84). This result supports the conclusion that these groups of features do contain more or less redundant information in terms of predicting function. Using the best correlated features provides a sparse model that predicts as well as the model built from the complete feature set, and carries the advantage of being less likely to perform well due to overfitting, and thus more interpretable in terms of the underlying biology.

Thus we also trained classifiers using the principal components as features. Using these alternative, composite features, performance quality was maintained (Fig 3E and 3F), with a mean AUC of 0.82 (standard deviation 0.11). Inspecting the key PCs contributing to a classifier, we see that PC2 (IgG2/4 vs. 1 / 3) makes the biggest contribution, modulated by subclass contributions in PC3 (IgG4) and PC4 (IgG3) and antigen contributions in PC5 (p24), and PC6 (V1V2) (Fig 31). Thus the PCA-based approach is largely consistent with the others, with subclass and antigen specificity again working in concert to predict function.

All three machine learning techniques perform quite well, despite the difficulty of the median-split classification problem and the rigorous fivefold cross-validation assessment. The PLR model is consistently a bit better, and performance is essentially equivalent for each technique across the different feature sets (complete, filtered, or PC), suggesting that over a wide range of different modeling approaches, antibody features are indeed robustly predictive of qualitative effector function.

The cytokine classifiers perform nearly as well as the ADCP ones, and the ADCC classifiers less accurately but still strikingly well. The choice of feature set (complete, filtered, PC) did not have a substantial effect on performance. The PLR approach was generally superior, with RF quite comparable and SVM somewhat degraded but still yielding good performance. Thus our hypothesis that antibody features enable robust, high-quality prediction of antibody function is well-supported by the summary results for each of three distinct effector functions. Furthermore, the logistic regression model enables straightforward identification of the key contributors, and points toward feature roles consistent with known IgG and innate immune cell biology.

For ADCC, the key contribution using the complete feature set is made by IgG1.gp41, consistent with ADCP in terms of subclass, but driven by a different antigen. In contrast there appears to be less contribution from IgG3 and IgG4 contributes positively (though the confidence in that coefficient is lower). Several of the selected features are gp41-specif1c. These trends are also largely reflected in the unsupervised feature:function correlations in Fig 2A. The cytokine feature usage is driven by IgG1 and IgG3 (with different antigens), along with an inconsistent contribution from IgG4, negative with p24 and gp140 and positive with gp41. Since these features are themselves highly correlated (Fig 2C), it appears that, despite the penalization in the PLR approach, this model is likely to be overf1t. For both functions, feature filtering results in much the same relative contributions as for the complete feature set, with coefficients more strongly focused on a few key features. Notably, the inconsistent use of IgG4 features is eliminated by filtering. The ADCC response for the PC features is driven by PC6, which appears primarily to distinguish the V1V2-specif1city. The PC features selected for the cytokines are more consistent with the other feature sets, with PC2 (IgG2/4 vs. 1/3) modulated by PC6 (V1V2), along with an IgG4.V1V2 down-selection via PC7.

Thus we also performed classification into the top and bottom quartiles (ignoring the middle half). While unsurprisingly, the best vs. worst classification performance was better than the better vs. worse, our focus was the features driving class assignment, which remained largely consistent (results not shown). In particular, IgGl, with a variety of antigens, was the dominating contributor, often complemented by an IgG3-based feature; in addition, IgG4 features contributed negatively to ADCP but positively to the other two functions.

Supervised learning: Regression

Again, three representative techniques were used to broadly assess the general ability of the data to support predictive models: Lars (regularized linear regression based on Lasso), Gaussian process regression (a nonlinear model), and support vector regression (a ker-nel-based model). We again built separate models for each function, under each set of input features.

While 200-replicate f1ve-fold cross-validation was used for performance assessment, leave-one-out cross-validation (LOOCV) was used to generate representative scatterplots of experimental vs. predicted functional values, as is appropriate when viewing LOOCV as a form of jackknife. The models are clearly predictive of ADCP, obtaining a mean Pearson correlation coefficient PCC = 0.64 (standard deviation 0.15) over the 200-replicate fivefold. An example LOOCV scatterplot is illustrated in Fig 4A; the correlated trend between observed and predicted ADCP is clear. Notably, the LOOCV and fivefold PCCs (Fig 4B) were similar.

As with penalized logistic regression, the regularization employed by Lars in training seeks to force coefficients to zero and yield a sparse model. Fig 4G depicts the coefficients and their p-values for a model trained on the entire set of features. Among the largest and most-confident coefficients, we see that IgG1.gp120 is again a strong positive contributor, joined by the related IgG1.gp41 and IgG3.p24, and IgG2.gp140 is a strong negative contributor. Despite the Lars penalization, the model incorporates offsetting positive and negative contributions from IgG4 under different antigens, though these features are highly correlated with each other (Fig 2C).

A possible statistical explanation for this is that the model works best when a few features are indicative of the response. A possible experimental explanation is that there are competitive effects, and indeed the contributions from multiple good antibodies are not additive in terms of recruiting effector cells.

Models learned from the filtered features from Fig 2B maintain about the same accuracy (mean PCC = 0.61 with standard deviation 0.15 for the 200-replicate fivefold (Fig 4D); an example LOOCV scatterplot is illustrated in Fig 4C). By inspecting features for a model trained on the filtered features (Fig 4H), we see that the prediction is driven primarily by IgG1.gp120 and 1gG3.V1V2, With a negative contribution from 1gG2.gp140. The contradictory 1gG4 contribution is resolved. Similarly, PCA-based models attain mean PCC of 0.61 With standard deviation 0.15 (Fig 4E and 4F), based largely on PC2 (1gG2/4 vs. 1/3) and someWhat on PCS (IgG4 vs. others), as can be seen in Fig 41. The performance of all three machine learning methods using all three feature sets is summarized in Table 1. As With classification, the linear model dominates, and all methods perform similarly well With any of the input feature sets.

While providing the desired trend overall (With a feW striking outliers), the ADCC regression With the complete feature set does not have as high a PCC (mean 0.40, standard deviation 0.18) as the ADCP one (mean 0.64, standard deviation 0.15). With a mean PCC of 0.58 and a standard deviation of 0.20, the cytokine regression is comparable to that observed in predicting ADCP, though the representative scatterplot is not as pleasing to the eye due to the density of subjects With low values. Feature filtering achieves essentially the same performance for ADCC but a degradation in the cytokine performance as assessed by PCC, though the scatterplot appears roughly as good. The sWitch to PC features degrades the PCC measurements for both functions, though again yielding trends that appear satisfactory visually.

As we saw for classification, the cytokine model has positive IgG1 and IgG3 contributions and inconsistent IgG4 contributions. For the filtered features, the ADCC model is focused on IgG1.gp41, with IgGl. gp140 replaced by the related IgG3.gp140. The feature-filtered model for cytokines retains IgG3.V1V2 and IgG1.gp120 contributions and resolves the IgG4 inconsistency, leaving a positive IgG4.gp41 contribution as observed in Fig 2A. When switching to the PCA-derived features, the ADCC regression model is driven by PC6 (V1V2), as with the classification model, while the cytokine regression model agrees with the classification model in its use of PC6 and PC7 with opposing signs, while weakening PC2 (IgG2/4 vs. 1/3) perhaps in lieu of added contributions from PC4 (IgG4) and PC3 (IgG3).

Once again the linear model dominates the nonlinear models, particularly for ADCC. With the complete feature set, this is likely directly attributable to overf1t-ting, and an improvement of the nonlinear methods upon starting with the filtered features though not as much with the PC features, was observed. As discussed in the methods, the presented results employ a polynomial kernel for Gaussian Process Regression and a radial basis kernel for Support Vector Regression; alternative kernels did not improve the performance. While the disappointing performance of the more sophisticated methods could potentially be improved by custom feature selection methods or parameter tuning, our goal here is not to provide such a benchmark but rather to establish the general scheme of predictive modeling of antibody feature: function relationships. The overall concordance observed between different feature sets, different regression and classification methods, and across multiple, complex, antibody functional activities, subjected to cross-validation assessment, demonstrates that indeed antibody features can be used to effectively predict functional activities.

Discussion

Sets of features emerge from patterns in the data, and these feature sets are able to robustly predict high/low levels of function, and are even informative enough to support quantitative predictions of functional activity. The subclass-specif1c contributions observed here are consistent with expectations, according to the receptors on the relevant effector cells, and the activity profiles among IgG subclasses [26]. At the same time, the approach provides a finer resolution picture of the interrelationships among antigen specificity, subclass, and effector function.

Thus while the prime included the gp120, gp41, and p24 antigens evaluated here, the boost only included gp120. Furthermore, cell-based functional assays employed particular antigens to stimulate a response, and those studied here are gp120-specific. Thus we might eXpect to see differences within functional responses among subjects according to different overall specificities of their antibodies, or even within antibody specificities depending on whether they were raised in the setting of the prime or the boost. Accordingly, associations observed here, such as those between gp41-specific antibodies and functional activity in assays in which only gp120 is presented, clearly do not have mechanistic significance with respect to functional assays that characterize only gp120-specific responses. However, they may nonetheless provide useful associative markers that functionally differentiate overall antibody responses to priming and boosting or among subjects that were more finely grained than subclass and antigen-specificity alone.

These approaches incorporate multiple features into a model, but do so in a way that avoids simply “memorizing” artifacts of the samples, as is easily possible with a sufficient number of features for a small sample set. Cross-validation analysis then ensures that the models are not overfit, by testing how well predictions from a model trained on one set of data match observations for another set. This predictive assessment stands in contrast to typical correlation analysis, which uses all the data and simply evaluates quality of fit.

To account for redundancy, we have used representative, common approaches including feature selection within the learning algorithm (via regularization), feature filtering (via feature clustering), and feature combination (via principal components analysis). The approaches were all fairly comparable in performance for this dataset, perhaps due to the relatively small number of initial features. Larger feature sets may result in more substantial differences, and require additional techniques to reduce the number of features contributing to a model down from a highly redundant input set to a reduced but representative and robust set. For example, elastic net type approaches [27] might strike a beneficial balance between eliminating redundant features and averaging them out to improve robustness.

Several representative methods were demonstrated, though a rigorous benchmarking comparison was not performed as that would require a larger, more diverse dataset. We conclude that while there are some clear differences in performance among the methods, they all show that there is sufficient information in the features to predictively model function. The penalized generalized linear models are generally very good, and provide the added advantage of easy interpretation and relatively low model complexity; as noted in the previous paragraph, a softer regularization might be beneficial in the future.

As an illustration, we note that subsequent to our modeling and characterization of feature:function relationships in the RV144 data, depletion studies confirmed a mechanistic role for antibodies associated with prediction quality. These experimental observations demonstrated that indeed IgG3 is important for a strong phagocytic response, with IgG3-depleted samples having significantly reduced ADCP activity [23]. Similarly, our models predicted that IgG4 has a negative impact on functional level, and an analogous depletion experiment did exhibit this trend across 2 different vaccine regimens, although the increase in activity in the RV144 samples when IgG4 was depleted did not meet statistical significance [23].

However, the approach described here can also be productively applied in other settings, shedding light on relationships specific to particular cohorts, as well as different vaccination and infection contexts. By integrating diverse datasets, it may even be possible to uncover more general rules governing the ways that antibodies bridge the adaptive and innate arms, and how those rules can then be specialized in a context-dependent fashion.

To this end, an important next step is a case/ control study with the potential to tease apart signatures leading to protection. Even in the context of the functions assayed here, a more complex multi-output model could be built in order to ascertain signatures of desirable polyfunctional responses. The fact that some functions were better predicted than others in the models described here, may indicate that additional antibody feature information could contribute to improved model performance. In particular, ADCC activity, the function predicted most poorly by the antigen and subclass data used here, is known to be dependent on antibody glycosylation state [22] , which was not assessed in this study. Feature data could be extended to characterize a wider range of relevant antibody features, including additional antigen specificities as well as characteristics of the Fc glycan structure, or interactions with the cellular antibody receptors expressed by NK cells and phagocytes. Overall, we find that the parallel assessment of antibody function and antibody features can provide for development of models enabling quantitative predictions of functional activity across multiple, divergent antibody activities. Because these antibody functions have been associated with better clinical outcomes in HIV infected subjects, as well as the protection observed in RV144 and in many settings beyond HIV infection, but are poorly predicted by antibody titer, we anticipate that this type of predictive model can provide significant value, both in terms of permitting the substitution of high-throughput biophysical characterization for low-throughput cell-based assays, as well as for uncovering novel structure:function relationships that can inform vaccine design efforts.

Methods

Data collection and preprocessing

Experimental methods used have been previously described [23]. Briefly, IgG was purified from all samples using Melon Gel according to the manufacturer’s instructions (Thermo Scientific). The functional activity of HIV-specific antibodies was determined in 3 different cell-based assays. Phagocytic activity was assessed using a monocyte-based assay in which the uptake of gp120-coated fluorescent beads is determined by flow cytometry [24]. Antibodies were tested at a concentration of 25 ug/ ml MN. Similarly, the cytotoxicity profile of antibodies was tested at a concentration of 100 ug/ml in the rapid fluorescent ADCC assay, which assesses the ability of antibodies to drive primary NK cells to lyse gp120-pulsed target cells [25]. Lastly, NK cell degranulation and cytokine secretion were monitored by flow cytometry as described [23]. Surface expression of CD 107a, and intracellular production of IFN-y and MIP- 1[3 were assessed, and the fraction of NK cells which were triple positive was determined. In order to profile antibody features, a customized antigen microsphere array was used to assess antibody specificity (gp120, gp140, V1V2, gp41, and p24) and subclass (IgG1,2,3,4) [14].

Background signal level was derived from the values for that feature among placebos, as the placebo mean plus one standard deviation. This background was subtracted from each vaccinee. Finally, the vaccinee values for the feature were scaled and centered to a mean of 0 and a standard deviation of 1, with values truncated to 60. For functional assays, data was not placebo-subtracted, but was instead inspected to ensure that low activity was observed in samples from placebo subjects

Unsupervised learning

Antibody feature:function and feature:feature correlations were computed over the set of 80 vaccinated subjects and assessed using Pearson correlation coefficient and p-value.

Hierarchical clusters were generated by the Ward linkage algorithm [28] , assessing pairwise similarity between profiles in terms of Pearson correlation coefficient (i.e., lr dissimilarity). By visual inspection, six groups were identified in the resulting dendrogram. The R package NbClust was also used to assess optimal numbers of clusters according to a number of different indices [29]. For each function and each group, the feature with the largest-magnitude feature:function correlation coefficient was identified; each such feature also had the best feature:function p-value within its group, < = 0.001.

Singular value decomposition was employed to determine a set of eigenvectors and corresponding eigenvalues, with the eigenvectors serving as a basis transformation matriX containing principal components that are linear combinations of the original features, and the eigenvalues indicating the amount of variance in the data captured by their eigenvectors. The top 7 were chosen for further use in supervised methods, by visual inspection of their components and their eigenvalues.

Supervised learning: Classification

Thus the learning favors sparse models, as zero-valued coefficients do not contribute to the penalty term. The R package “penalized” was used for PLR. It employs a greedy search to determine the best value for it according to nested cross-validation (i.e., given a training set, doing an internal cross-validation within it to determine the performance under possible 2» choices).

The R package “RRF” was used for RRF-based learning. Two parameters were specified: mtry, the number of features to be randomly sampled at each split, which was set to the number of input features; and ntree, the number of trees or bootstrap samples, which was set to 2000 to obtain more reliable results. The regularization parameter is handled automatically by the method, based on the scores from a 0-penalty model.

The R package “e1071”, based on the C classification method of the libsvm library [34], was used for SVM-based classification. The standard linear, polynomial, and radial basis kernels were evaluated, and results presented for the radial basis function. Default parameter values were used except where noted. Each method was trained separately for each function with each of three different feature sets: the complete preprocessed set, the filtered set from the feature:feature clustering, and the set of principal components. To study the impact of selecting different features in the cluster-based filtering, the Lars method was also applied to each possible set of features combining one from each cluster. To obtain robust characterization of classification performance, 200 replicate fivefold cross validation was employed; i.e., the data was randomly split into fifths, four used for training and one for testing, with 200 different such training/testing runs. The R package “ROCR” was used to calculate a cutoff independent evaluation of the area under the ROC curve (AUC) for each replicate. To gain insights into the features driving the PLR classification performance, a model was also built using all subjects in order to obtain the best confidence in the coefficients. In order to evaluate the impact (both prediction quality and feature usage) of median-based dichotomization, the PLR-based approach was applied in the same manner to a dataset limited to the subjects with the top and bottom quartile ADCP values.

Supervised learning: Regression

The R package “parcor” was used for Lars. As with PLR the penalty weight was selected by cross-validation. The parameter for the number of splits was set to 10 for robust f1tting.

Observed values are used to fit the functions and thereby predict unobserved ones. The R package “kernlab” was used for GP. A polynomial kernel function was used to fit the GP model, as it performed better than other kernels.

The R package “kernlab” was also used for SVR. As with SVM, we evaluated the standard linear, polynomial, and radial basis kernels and presented the results for the radial basis function. Default parameter values were used except where noted. The different feature sets were tested as described in the classification section.

The PCC was computed over ZOO-replicate fivefold cross-validation. In addition, leave-one-out cross-validation was performed in order to generate representative scatterplots. A Lars model was trained on all subjects in order to enable inspection of feature coefficients.

Supporting Information

(A) Relative variance and (B) log absolute variance captured by each principal component. Red lines indicate truncation after the 7 leading principal components, Which capture most of the variance and are most

readily interpretable.

(AF) Prediction results by ZOO-replicate fivefold cross-validation, illustrating PLR values (>0.5 predicted high ADCP; <O.5 predicted low) for one replicate (A,C,E) and providing area under the ROC curve (AUC) over all 200 replicates (B,D,F). BOX & whisker plots show the median (thick center line), upper and lower quartiles (box), and 1.5 times the interquartile range (whiskers); all points are also plotted in a jittered stripchart. Colors for the classification examples indicate high (red) and low (blue) observed ADCP. (GI) Coefficients and p-Values of the features for a model trained on all subjects. Different input features were used in classification: (A,B,G) the complete set; (C,D,H) the filtered set; (E,F,I) the principal components. Colors for the feature coefficients indicate antibody subclass and antigen-specificity. For conve-S4 Fig. Regression modeling of ADCP from antibody features by Lars. (AF) Representative regression scatterplot based on leave-one-out cross-validation (A,C,E), and PCCs for ZOO-repli-cate fivefold cross-validation (B,D,F). (GI) Coefficients and p-Values of the features for a model trained on all subjects. Different input features were used: (A,B<G) the complete set; (C,D,H) the filtered set; (E,F,I) the principal components. BOX & whisker plots show the median (thick center line), upper and lower quartiles (box), and 1.5 times the interquartile range (whiskers); all points are also plotted in a jittered stripchart. Colors for the feature coefficients indicate antibody subclass and antigen-specificity. Acknowledgments We gratefully acknowledge the assistance of Charla Andrews in managing the study.

Author Contributions

Performed the experiments: AWC TIS GA. Analyzed the data: IC MEA CBK. Contributed reagents/materials/analysis tools: IC CBK. Wrote the paper: IC AWC TIS GA MEA CBK. Designed the study: IC GA MEA

Designed, developed, and applied the analysis approaches: IC MEA CBK. Designed and conducted vaccine trials: SRN PP SN IK RIO DF MLR NLM IHK.