Abstract | Our model utilizes a hierarchical prior to link the feature weights for shared features in several single-task models and the joint model. |
Base Models | Let 6 be the feature weights , and f (s, y, yi_1) the feature function over adjacent segments yz- and yi_1 in sentence 3.4 The log likelihood of a semi-CRF for a single sentence 3 is given by: |
Base Models | Let 6 be the vector of feature weights . |
Hierarchical Joint Learning | Each model has its own set of parameters ( feature weights ). |
Hierarchical Joint Learning | These have corresponding log-likelihood functions £p(Dp; 6p), £n(Dn; 6n), and Lj (133-; 63-), where the PS are the training data for each model, and the 6s are the model-specific parameter ( feature weight ) vectors. |
Hierarchical Joint Learning | These three models are linked by a hierarchical prior, and their feature weight vectors are all drawn from this prior. |
Introduction | Then, the singly-annotated data can be used to influence the feature weights for the shared features in the joint model. |
Background | where {hm(f, e) | m = l, ..., M} is a set offea-tures, and Am is the feature weight corresponding to the m-th feature. |
Background | In this work, Minimum Error Rate Training (MERT) proposed by Och (2003) is used to estimate feature weights ,1 over a series of training samples. |
Background | where 44(e) is the log-scaled model score of e in the t-th member system, and ,8: is the corresponding feature weight . |
Experiments on Parsing-Based SMT | The feature weights are tuned on the development set using the minimum error |
Experiments on Phrase-Based SMT | And Koehn's implementation of minimum error rate training (Och, 2003) is used to tune the feature weights on the development set. |
Improving Statistical Bilingual Word Alignment | feature weights , respectively. |
Experimental Setup | We learned the feature weights with a linear SVM, using the software SVM-OOPS (Woodsend and Gondzio, 2009). |
Experimental Setup | This tool gave us directly the feature weights as well as support vector values, and it allowed different penalties to be applied to positive and negative misclassifications, enabling us to compensate for the unbalanced data set. |
Experimental Setup | For each phrase, features were extracted and salience scores calculated from the feature weights determined through SVM training. |