A multitask transfer learning solution | Let wk denote the weight vector of the linear classifier that separates positive instances of auxiliary type Ak, from negative instances, and let wT denote a similar weight vector for the target type ’2'. |
A multitask transfer learning solution | If different relation types are totally unrelated, these weight vectors should also be independent of each other. |
A multitask transfer learning solution | But because we observe similar syntactic structures across different relation types, we now assume that these weight vectors are related through a common component V2 |
Abstract | The proposed framework models the commonality among different relation types through a shared weight vector , enables knowledge learned from the auxiliary relation types to be transferred to the target relation type, and allows easy control of the tradeoff between precision and recall. |
Conclusions and future work | In the multitask learning framework that we introduced, different relation types are treated as different but related tasks that are learned together, with the common structures among the relation types modeled by a shared weight vector . |
Experiments | the number of nonzero entries in the shared weight vector V. To see how the performance may vary as H changes, we plot the performance of TL-comb and TL-auto in terms of the average Fl across the seven target relation types, with H ranging from 100 to 50000. |
Training method | Input: Training set 8 = {(xt,yt)}tT=1 Output: Model weight vector w |
Training method | where w is a weight vector and f is a feature representation of an input x and an output y. |
Training method | Learning a mapping between an input-output pair corresponds to finding a weight vector w such that the best scoring path of a given sentence is the same as (or close to) the correct path. |
Introduction | L1 regularization penalizes the weight vector for its Ll-norm (i.e. |
Log-Linear Models | In effect, it forces the weight to receive the total Ll penalty that would have been applied if the weight had been updated by the true gradients, assuming that the current weight vector resides in the same orthant as the true weight vector . |
Log-Linear Models | problem as a Ll-constrained problem (Lee et al., 2006), where the conditional log-likelihood of the training data is maximized under a fixed constraint of the Ll-norm of the weight vector . |
Log-Linear Models | (2008) describe efficient algorithms for projecting a weight vector onto the Ll-ball. |
Collaborative Decoding | Let 2m be the feature weight vector for member decoder dm, the training procedure proceeds as follows: |
Collaborative Decoding | For each decoder dm, find a new feature weight vector 2;,1 which optimizes the specified evaluation criterion L on D using the MERT algorithm based on the n-best list Jim generated by dm: |
Collaborative Decoding | where T denotes the translations selected by re-ranking the translations in Jim using a new feature weight vector A |