Alignment | Sentences for which the decoder can not find an alignment are discarded for the phrase model training . |
Alignment | phrase model training . |
Alignment | , N is used for both the initialization of the translation model p(f|é) and the phrase model training . |
Conclusion | While models trained from Viterbi alignments already lead to good results, we have demonstrated |
Experimental Evaluation | ), the count model trained with leaving-one-out (110) and cross-validation (cv), the weighted count model and the full model. |
Experimental Evaluation | Further, scores for fixed log-linear interpolation of the count model trained with leaving-one-out with the heuristic as well as a feature-wise combination are shown. |
Introduction | Viterbi Word Alignment Phrase Alignment word translation models phrase translation models trained by EM Algorithm trained by EM Algorithm heuristic phrase phrase translation counts probabilities Phrase Translation Table ‘ ‘ Phrase Translation Table |
Introduction | Our results show that the proposed phrase model training improves translation quality on the test set by 0.9 BLEU points over our baseline. |
Abstract | Experiments on joint parsing and named entity recognition, using the OntoNotes corpus, show that our hierarchical joint model can produce substantial gains over a joint model trained on only the jointly annotated data. |
Hierarchical Joint Learning | Our resulting joint model is of higher quality than a comparable joint model trained on only the jointly-annotated data, due to all of the evidence provided by the additional single-task data. |
Hierarchical Joint Learning | When we rescale the model-specific prior, we rescale based on the number of data in that model’s training set, not the total number of data in all the models combined. |
Hierarchical Joint Learning | Having uniformly randomly drawn datum d E UmeM ’Dm, let m(d) E M tell us to which model’s training data the datum belongs. |
Introduction | We built a joint model of parsing and named entity recognition (Finkel and Manning, 2009b), which had small gains on parse performance and moderate gains on named entity performance, when compared with single-task models trained on the same data. |
Introduction | entity models trained on larger corpora, annotated with only one type of information. |
Introduction | We use a hierarchical prior to link a joint model trained on jointly-annotated data with other single-task models trained on single-task annotated data. |
Conclusion | Models trained on parser-annotated Wikipedia text and MEDLINE text had improved performance on these target domains, in terms of both speed and accuracy. |
Results | For all four algorithms the training time is proportional to the amount of data, but the GIS and BFGS models trained on only CCGbank took 4,500 and 4,200 seconds to train, while the equivalent perceptron and MIRA models took 90 and 95 seconds to train. |
Results | For speed improvement these were MIRA models trained on 4,000,000 parser- |
Results | In particular, note that models trained on Wikipedia or the biomedical data produce lower F-scores3 than the baseline on newswire. |
Clickthrough Data and Spelling Correction | Unfortunately, we found in our experiments that the pairs extracted using the method are too noisy for reliable error model training , even with a very tight threshold, and we did not see any significant improvement. |
Clickthrough Data and Spelling Correction | Although by doing so we could miss some basic, obvious spelling corrections, our experiments show that the negative impact on error model training is negligible. |
Clickthrough Data and Spelling Correction | We found that the error models trained using the data directly extracted from the query reformulation sessions suffer from the problem of underestimating the self-transformation probability of a query P(Q2=Q1|Q1), because we only included in the training data the pairs where the query is different from the correction. |
The Baseline Speller System | The language model (the second factor) is a backoff bigram model trained on the tokenized form of one year of query logs, using maximum likelihood estimation with absolute discounting smoothing. |
Empirical Evaluation | Similarly, we call the models trained from this data supervised though full supervision was not available. |
Empirical Evaluation | Additionally, we report the results of the model trained with all the 750 texts labeled (Supervised UB), its scores can be regarded as an upper bound on the results of the semi-supervised models. |
Empirical Evaluation | Surprisingly, its precision is higher than that of the model trained on 750 labeled examples, though admittedly it is achieved at a very different recall level. |
Results | Table 1 shows sample semantic classes induced by models trained on the corpus of BNC verb-object co-occurrences. |
Results | unsurprising given that they are structurally similar models trained on the same data. |
Results | It would be interesting to do a full comparison that controls for size and type of corpus data; in the meantime, we can report that the LDA and ROOTH-LDA models trained on verb-object observations in the BNC (about 4 times smaller than AQUAINT) also achieve a perfect score on the Holmes et al. |