Abstract | To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. |
Abstract | We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. |
Abstract | Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. |
Author Summary | Machine learning techniques may replace expensive in-vitro laboratory experiments by learning an accurate model of it. |
Author Summary | We focused on recent advances in kernel methods and machine learning to learn a model that already had excellent results. |
Introduction | Machine learning and kernel methods [8] have the potential to help with this endeavour. |
Introduction | Moreover, the proposed approach can be employed without known ligands for the target protein because it can leverage recent multi-target machine learning predictors [10, 14] where ligands for similar targets can serve as an initial training set. |
The Generic String kernel | Such kernels have been widely used in applications of machine learning to biology. |
The Generic String kernel | The GS kernel was also part of a method that won the 2012 Machine Learning Competition in Immunology [20]. |
The machine learning approach | The machine learning approach |
The machine learning approach | However, some machine learning methods such as neural networks and its derivatives (deep neural networks) are not compatible with the proposed methodology. |
Abstract | In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). |
Abstract | This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates. |
Discussion | We have demonstrated that the integration of antibody feature and function data via machine learning models and methods helps identify and make use of critical landmarks in the complex landscape of antibody feature:function activity. |
Introduction | Thus, this trial represents a compelling opportunity to profile antibody structure:function relationships from the standpoint of relevance to protection and an excellent setting in which to apply machine learning methods to characterize the relationship between antibody features and function in a population whose response to vaccination varied in a clinically relevant way. |
Introduction | In order to discover and model multivariate antibody feature: function relationships in data from RV144 vaccinees, we employ a representative set of different machine learning methodologies, within a cross-validation setting that assesses their ability to make predictions for subjects not used in model development. |
Introduction | While “predict” often connotes prospective evaluation, here, as is standard in statistical machine learning , it means only that models are trained with data for some subjects and are subsequently applied to other subjects in order to forecast unknown quantities from known quantities. |
Results | As discussed in the introduction, the data and correlation analyses have been previously presented [23]; we recapitulate the most relevant points here to lead into our machine learning approaches. |
Results | In order to better extract, assess, and utilize such observations, machine learning techniques were applied to provide models of the relationship between characteristics of HIV-specific antibodies induced by vaccination, and their functional activity. |
Supervised learning: Classification | All three machine learning techniques perform quite well, despite the difficulty of the median-split classification problem and the rigorous fivefold cross-validation assessment. |
Supervised learning: Regression | The performance of all three machine learning methods using all three feature sets is summarized in Table 1. |
Supervised learning: Regression | Table 1 summarizes the performance for ADCC and cytokines under all machine learning techniques and feature sets. |
Author Summary | We then apply an ensemble of various machine learning algorithms to infer environmental and cellular information such as strain, growth phase, medium, oxygen level, antibiotic and carbon source. |
Inference of missing phase information using iterative learning | Inference is based on consensus-based approach of four machine learning methods described above. |
Introduction | As such, efficient training of machine learning methods is hindered due to data complexity, compatibility and the curse of dimensionality that plagues datasets with thousands of features (genes) but only a few samples (conditions). |
Supporting Information | In each iteration, the phase of all samples that were originally unannotated is predicted, based on an ensample of 4 machine learning methods (Naive Bayes, SVM, Decision Tree, KNN) that produce a consensus outcome, as described in the Methods section of the manuscript. |