A Skeleton-based Approach to MT 2.1 Skeleton Identification | As is standard in SMT, we further assume that 1) the translation process can be decomposed into a derivation of phrase-pairs (for phrase-based models) or translation rules (for syntax-based models); 2) and a linear function is used to assign a model score to each derivation. |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | above problem can be redefined in a Viterbi fashion - we find the derivation d with the highest model score given 3 and 7': |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | The skeleton translation model focuses on the translation of the sentence skeleton, i.e., the solid (red) rectangles; while the full translation model computes the model score for all those phrase-pairs, i.e., all solid and dashed rectangles. |
Experiments | As described in Section 3.2, the weight of each variable is a linear combination of the language model score , three classifier confidence scores, and three classifier disagreement scores. |
Experiments | We use the Web 1T 5—gram corpus (Brants and Franz, 2006) to compute the language model score for a sentence. |
Experiments | Finally, the language model score , classifier confidence scores, and classifier disagreement scores are normalized to take values in [0, 1], based on the H00 2011 development data. |
Inference with First Order Variables | The language model score h(s’, LM) of 8’ based on a large web corpus; |
Inference with First Order Variables | Next, to compute whpyg, we collect language model score and confidence scores from the article (ART), preposition (PREP), and noun number (NOUN) classifier, i.e., E = {ART, PREP, NOUN}. |
Inference with Second Order Variables | When measuring the gain due to 21131213312 2 1 (change cat to cats), the weight wNoungmluml is likely to be small since A cats will get a low language model score , a low article classifier confidence score, and a low noun number classifier confidence score. |
Related Work | Features used in classification include surrounding words, part-of—speech tags, language model scores (Gamon, 2010), and parse tree structures (Tetreault et al., 2010). |
Abstract | However, by using an SVM ranker to combine the realizer’s model score together with features from multiple parsers, including ones designed to make the ranker more robust to parsing mistakes, we show that significant increases in BLEU scores can be achieved. |
Conclusion | In this paper, we have shown that while using parse accuracy in a simple reranking strategy for self-monitoring fails to improve BLEU scores over a state-of-the-art averaged perceptron realization ranking model, it is possible to significantly increase BLEU scores using an SVM ranker that combines the realizer’s model score together with features from multiple parsers, including ones designed to make the ranker more robust to parsing mistakes that human readers would be unlikely to make. |
Introduction | Therefore, to develop a more nuanced self-monitoring reranker that is more robust to such parsing mistakes, we trained an SVM using dependency precision and recall features for all three parses, their n-best parsing results, and per-label precision and recall for each type of dependency, together with the realizer’s normalized perceptron model score as a feature. |
Reranking with SVMs 4.1 Methods | Similarly, we conjectured that large differences in the realizer’s perceptron model score may more reliably reflect human fluency preferences than small ones, and thus we combined this score with features for parser accuracy in an SVM ranker. |
Reranking with SVMs 4.1 Methods | perceptron model score the score from the realizer’s model, normalized to [0,1] for the realizations in the n-best list |
Reranking with SVMs 4.1 Methods | We trained different models to investigate the contribution made by different parsers and different types of features, with the perceptron model score included as a feature in all models. |
Experiments | Both show an [proved relationship between model score and F-:asure. |
Oracle Parsing | Model score |
Oracle Parsing | Digging deeper, we compared parser model score against Viterbi F—score and oracle F-score at a va- |
Oracle Parsing | Model score |
Bk <— BESTS(X,Bk_1,W) | beam 13, (sequences in B are sorted in terms of model score , i.e., w-(p(13[0]) > w-(p(13[1]) ...). |
Introduction | step1 step2 stepk-1 stepk beam C 5 P 15 P 25 P 32 P 4 C 13 P 22 P 28 P 2_ P 10 C 20 P 26 alid updat model score C 25 |
Introduction | The numbers following C/P are model scores . |
Introduction | In particular, one needs to guarantee that each parameter update is valid, i.e., the correct action sequence has lower model score than the predicted onel. |
A Probabilistic Formulation for HVR | 0 Acoustic model score: p(O|W) o Haptic model score : p(H|£) |
A Probabilistic Formulation for HVR | o PLI model score : P(£|W) |
A Probabilistic Formulation for HVR | 0 Language model score : P(W) |
Introduction | In the search procedure, frequent computation of the model score is needed for the search heuristic function, which will be challenged by the decoding efficiency for the neural network based translation model. |
Introduction | The main reason why cube-pruning works is that the translation model is linear and the model score for the language model is approximately monotonic (Chiang, 2007). |
Introduction | The premise of cube-pruning is that the language model score is approximately monotonic (Chiang, 2007). |
Related Work | The training objective is that the original ngram is expected to obtain a higher language model score than the corrupted ngram by a margin of 1. |
Related Work | (1) where t is the original ngram, 757" is the corrupted ngram, wa(-) is a one-dimensional scalar representing the language model score of the input ngram. |
Related Work | The output f Cw is the language model score of the input, which is calculated as given in Equation 2, where L is the lookup table of word embedding, w1, 2122, (91, ()2 are the parameters of linear layers. |
A Class-based Model of Agreement | The agreement model scores sequences of morpho-syntactic word classes, which express grammatical features relevant to agreement. |
A Class-based Model of Agreement | However, in MT, we seek a measure of sentence quality (1(6) that is comparable across different hypotheses on the beam (much like the n-gram language model score ). |
A Class-based Model of Agreement | Discriminative model scores have been used as MT features (Galley and Manning, 2009), but we obtained better results by scoring the l-best class sequences with a generative model. |
Inference during Translation Decoding | The agreement model score is one decoder feature function. |
Introduction | Our model scores hypotheses during decoding. |
Background: Hypergraphs | The labels for leaves will be words, and will be important in defining strings and language model scores for those strings. |
Conclusion | For each 12 E VL, define av = maxpm3(p)=v Mp), where MP) = (100109): 02(19):0309))—/\1(01(P))—/\2(v2(19))—Zsepl(p) — 25610209) Here h is a function that computes language model scores , and the other terms involve Lagrange mulipliers. |
Introduction | Informally, the first decoding algorithm incorporates the weights and hard constraints on translations from the synchronous grammar, while the second decoding algorithm is used to integrate language model scores . |
Introduction | We compare our method to cube pruning (Chiang, 2007), and find that our method gives improved model scores on a significant number of examples. |
The Full Algorithm | In the simple algorithm, the first step was to predict the previous leaf for each leaf 21, under a score that combined a language model score with a Lagrange multiplier score (i.e., compute arg Inan 6(711, 21) where 6011,21) = 6011,11) + In this section we describe an algorithm that for each leaf 21 again predicts the previous leaf, but in addition predicts the full path back to that leaf. |
Abstract | The minimum Bayes risk (MBR) decoding objective improves BLEU scores for machine translation output relative to the standard Viterbi objective of maximizing model score . |
Computing Feature Expectations | Translation forests compactly encode an exponential number of output translations for an input sentence, along with their model scores . |
Computing Feature Expectations | 3Decoder states can include additional information as well, such as local configurations for dependency language model scoring . |
Computing Feature Expectations | The n-gram language model score of e similarly decomposes over the h in e that produce n-grams. |
Experimental Results | Figure 4: Three translations of an example Arabic sentence: its human- generated reference, the translation with the highest model score under Hiero (Viterbi), and the translation chosen by forest-based consensus decoding. |
Ensemble Decoding | where 6} denotes the mixture operation between two or more model scores . |
Ensemble Decoding | o Weighted Max (wmax): where the ensemble score is the weighted max of all model scores . |
Ensemble Decoding | Since in log-linear models, the model scores are not normalized to form probability distributions, the scores that different models assign to each phrase-pair may not be in the same scale. |
Experiments & Results 4.1 Experimental Setup | An interesting observation based on the results in Table 3 is that uniform weights are doing reasonably well given that the component weights are not optimized and therefore model scores may not be in the same scope (refer to discussion in §3.2). |
Abstract | This paper applies MST parsing to MT, and describes how it can be integrated into a phrase-based decoder to compute dependency language model scores . |
Dependency parsing for machine translation | While it seems that loopy graphs are undesirable when the goal is to obtain a syntactic analysis, that is not necessarily the case when one just needs a language modeling score . |
Machine translation experiments | We use the standard features implemented almost exactly as in Moses: four translation features (phrase-based translation probabilities and lexically-weighted probabilities), word penalty, phrase penalty, linear distortion, and language model score . |
Machine translation experiments | model score computed with the dependency parsing algorithm described in Section 2. |
Experiments | This figure indicates the advantage of the two-pass decoding strategy in producing translations with a high model score in less time. |
Experiments | However, model scores do not directly translate into BLEU scores. |
Experiments | Simply by exploring more (200 times the log beam) after-goal items, we can optimize the Viterbi synchronous parse significantly, shown in Figure 3(left) in terms of model score versus search time. |
Dependency Language Model | In order to calculate the dependency language model score , or depLM score for short, on the fly for |
Implementation Details | Language model score . |
Implementation Details | Dependency language model score 8. |
Abstract | In this work, we construct a statistical model of grammaticality using various linguistic features (e.g., misspelling counts, parser outputs, n-gram language model scores ). |
Discussion and Conclusions | While Post found that such a system can effectively distinguish grammatical news text sentences from sentences generated by a language model, measuring the grammaticality of real sentences from language leam-ers seems to require a wider variety of features, including n-gram counts, language model scores , etc. |
Experiments | To create further baselines for comparison, we selected the following features that represent ways one might approximate grammaticality if a comprehensive model was unavailable: whether the link parser can fully parse the sentence (complete_l ink), the Gigaword language model score (gigaword_avglogprob), and the number of misspelled tokens (nummisspelled). |
Model Training | LSGT(VV7 V, 8[l’n]) = —10g( [m] ZtEnbest exp (yt ) (7) where yggile is the model score of a oracle translation candidate for the span [1, n] . |
Our Model | for SMT performance, such as language model score and distortion model score . |
Our Model | The commonly used features, such as translation score, language model score and distortion score, are used as the recurrent input vector :c . |
Experiments | Method 4, named REBOL, implements REsponse-Based Online Learning by instantiating y+ and y‘ to the form described in Section 4: In addition to the model score 3, it uses a cost function 0 based on sentence-level BLEU (Nakov et al., 2012) and tests translation hypotheses for task-based feedback using a binary execution function 6. |
Response-based Online Learning | (2012), inter alia) and incorporates the current model score , leading to various ramp loss objectives described in Gimpel and Smith (2012). |
Response-based Online Learning | The opposite of y+ is the translation y‘ that leads to negative feedback, has a high model score , and a high cost. |