Inferring a learning curve from mostly monolingual data | In scenario $2, the models trained from the seed parallel corpus and the features used for inference (Section 4) provide complementary information. |
Inferring a learning curve from mostly monolingual data | Using the models trained for the experiments in Section 3, we estimate the squared extrapolation error at the anchors 83- when using models trained on size up to 30;, and set the confidence in the extrapolations8 for u to its inverse: |
Inferring a learning curve from mostly monolingual data | For the cases where a slightly larger in-domain “seed” parallel corpus is available, we introduced an extrapolation method and a combined method yielding high-precision predictions: using models trained on up to 20K sentence pairs we can predict performance on a given test set with a root mean squared error in the order of l BLEU point at 75K sentence pairs, and in the order of 2-4 BLEU points at 500K. |
Selecting a parametric family of curves | For a certain bilingual test dataset d, we consider a set of observations 0d 2 {(301, yl), ($2, yg)...(;vn, 3471)}, where y, is the performance on d (measured using BLEU (Papineni et al., 2002)) of a translation model trained on a parallel corpus of size 307;. |
Experiments and Results | Our baseline decoder is an in-house implementation of Bracketing Transduction Grammar (Dekai Wu, 1997) (BTG) in CKY-style decoding with a lexical reordering model trained with maximum entropy (Xiong et al., 2006). |
Experiments and Results | The language model is 5-gram language model trained with the target sentences in the training data. |
Experiments and Results | The language model is 5-gram language model trained with the Giga-Word corpus plus the English sentences in the training data. |
Graph Construction | Note that, due to pruning in both decoding and translation model training , forced alignment may fail, i.e. |
Hello. My name is Inigo Montoya. | First, we show a concrete sense in which memorable quotes are indeed distinctive: with respect to lexical language models trained on the newswire portions of the Brown corpus [21], memorable quotes have significantly lower likelihood than their non-memorable counterparts. |
Hello. My name is Inigo Montoya. | In particular, we analyze a corpus of advertising slogans, and we show that these slogans have significantly greater likelihood at both the word level and the part-of-speech level with respect to a language model trained on memorable movie quotes, compared to a corresponding language model trained on non-memorable movie quotes. |
Never send a human to do a machine’s job. | In particular, the newswire section of the Brown corpus is predicted better at the lexical level by the language model trained on non-memorable quotes. |
Conclusion and Outlook | We achieved best results when the model training data, MT tuning set, and MT evaluation set contained roughly the same genre. |
Discussion of Translation Results | For comparison, +POS indicates our class-based model trained on the 11 coarse POS tags only (e.g., “Noun”). |
Discussion of Translation Results | The best result—a +1.04 BLEU average gain—was achieved when the class-based model training data, MT tuning set, and MT evaluation set contained the same genre. |
Experiments | In Table l, we show the first four samples of length between 15 and 20 generated from our model and a 5- gram model trained on the Penn Treebank. |
Experiments | Table 5: Classification accuracies on the noisy WSJ for models trained on WSJ Sections 2—21 and our 1B token corpus. |
Experiments | In Table 4, we also show the performance of the generative models trained on our 1B corpus. |
Experimental Results 5.1 Data Resources | Note that the latter are derived from models trained with the Los Angeles Times data, while the Holmes results are derived from models trained with 19th—century novels. |
Sentence Completion via Language Modeling | Our baseline model is a Good—Turing smoothed model trained with the CMU language modeling toolkit (Clarkson and Rosenfeld, 1997). |
Sentence Completion via Language Modeling | For the SAT task, we used a trigram language model trained on 1.1B words of newspaper data, described in Section 5.1. |