Experiments | imizing the LM score over all possible permutations. |
Experiments | Usually the LM score is used as one component of a more complex decoder score which also includes biphrase and distortion scores. |
Experiments | But in this particular “translation task” from bad to good English, we consider that all “biphrases” are of the form 6 — e, where e is an English word, and we do not take into account any distortion: we only consider the quality of the permutation as it is measured by the LM component. |
Phrase-based Decoding as TSP | 4.1 From Bigram to N-gram LM |
Introduction | This provides a compelling advantage over previous dependency language models for MT (Shen et al., 2008), which use a 5 - gram LM only during rerank-ing. |
Introduction | In our experiments, we build a competitive baseline (Koehn et al., 2007) incorporating a 5-gram LM trained on a large part of Gigaword and show that our dependency language model provides improvements on five different test sets, with an overall gain of 0.92 in TER and 0.45 in BLEU scores. |
Machine translation experiments | This LM required 16GB of RAM during training. |
Machine translation experiments | Regarding the small difference in BLEU scores on MT08, we would like to point out that tuning on MTOS and testing on MT08 had a rather adverse effect with respect to translation length: while the two systems are relatively close in terms of BLEU scores (24.83 and 24.91, respectively), the dependency LM provides a much bigger gain when evaluated with BLEU precision (27.73 vs. 28.79), i.e., by ignoring the brevity penalty. |
Machine translation experiments | LM newswire web speech all |
Related work | In the latter paper, Huang and Chiang introduce rescoring methods named “cube pruning” and “cube growing”, which first use a baseline decoder (either synchronous CFG or a phrase-based system) and no LM to generate a hypergraph, and then rescoring this hypergraph with a language model. |
Experiment | VS 0.4196 0.4542 0.6600 BM25 0.4235 T 0.4579 0.6600 LM 0.4158 0.4520 0.6560 PMI 0.4177 0.4538 0.6620 LSA 0.4155 0.4526 0.6480 WP 0.4165 0.4533 0.6640 |
Experiment | Model Precision Recall F-Measure BASELINE 0.305 0.866 0.451 VS 0.331 0.807 0.470 BM25 0.327 0.795 0.464 LM 0.325 0.794 0.461 LSA 0.315 0.806 0.453 PMI 0.342 0.603 0.436 DTP 0.322 0.778 0.455 VS-LSA 0.335 0.769 0.466 VS-PMI 0.311 0.833 0.453 VS-DTP 0.342 0.745 0.469 |
Experiment | Differences in effectiveness of VS, BM25, and LM come from parameter tuning and corpus differences. |
Term Weighting and Sentiment Analysis | IR models, such as Vector Space (VS), probabilistic models such as BM25, and Language Modeling ( LM ), albeit in different forms of approach and measure, employ heuristics and formal modeling approaches to effectively evaluate the relevance of a term to a document (Fang et al., 2004). |
Term Weighting and Sentiment Analysis | In our experiments, we use the Vector Space model with Pivoted Normalization (VS), Probabilistic model (BM25), and Language modeling with Dirichlet Smoothing ( LM ). |