Decoding | The language model ( LM ) scoring is directly integrated into the cube pruning algorithm. |
Decoding | Thus, LM estimates are available for all considered hypotheses. |
Decoding | Because the output trees can remain discontiguous after hypothesis creation, LM scoring has to be done individually over all output trees. |
Approach to Sentence-Level Dialect Identification | The aforementioned approach relies on language models ( LM ) and MSA and EDA Morphological Analyzer to decide whether each word is (a) MSA, (b) EDA, (c) Both (MSA & EDA) or (d) OOV. |
Approach to Sentence-Level Dialect Identification | The following variants of the underlying token-level system are built to assess the effect of varying the level of preprocessing on the underlying LM on the performance of the overall sentence level dialect identification process: (1) Surface, (2) Tokenized, (3) CODAfied, and (4) Tokenized-CODA. |
Approach to Sentence-Level Dialect Identification | Orthography Normalized (CODAfied) LM : since DA is not originally a written form of Arabic, no standard orthography exists for it. |
Related Work | Amazon Mechanical Turk and try a language modeling ( LM ) approach to solve the problem. |
Comparative Study | Figure 4: bilingual LM illustration. |
Comparative Study | 4.3 Bilingual LM |
Comparative Study | We build a 9-gram LM using SRILM toolkit (Stolcke, 2002) with modified Kneser—Ney smoothing. |
Experiments | Our baseline is a phrase-based decoder, which includes the following models: an n-gram target-side language model ( LM ), a phrase translation model and a word-based lexicon model. |
Experiments | 0 lowercased training data from the GALE task (Table 1, UN corpus not included) alignment trained with GIZA++ o tuning corpus: NIST06 test corpora: NIST02 03 04 05 and 08 o 5-gram LM (1694412027 running words) trained by SRILM toolkit (Stolcke, 2002) with modified Kneser—Ney smoothing training data: target side of bilingual data. |
Abstract | We propose an online approach, integrating source LM, and / or, back-transliteration and English LM . |
Experiments | We observed slight improvement by incorporating the source LM , and observed a 0.48 point F—value increase over baseline, which translates to 4.65 point Katakana F—value change and 16.0% (3.56% to 2.99 %) WER reduction, mainly due to its higher Katakana word rate (11.2%). |
Experiments | H. This type of error is reduced by +LM-P, e.g., * 7°?X/fify 7 pumsu chikku “*plus tick” to 703%?‘y7 pumsuchz’kku “plastic” due to LM projection. |
Experiments | performance, which may be because one cannot limit where the source LM features are applied. |
Introduction | We refer to this process of transliterating unknown words into another language and using the target LM as LM projection. |
Introduction | Since the model employs a general transliteration model and a general English LM , it achieves robust WS for unknown words. |
Use of Language Model | As we mentioned in Section 2, English LM knowledge helps split transliterated compounds. |
Use of Language Model | We use ( LM ) projection, which is a combination of back-transliteration and an English model, by extending the normal lattice building process as follows: |
Use of Language Model | Then, edges are spanned between these extended English nodes, instead of between the original nodes, by additionally taking into consideration English LM features (ID 21 and 22 in Table 1): ¢iMP(wi) = IOgPWi) and ¢§Mp(wi—1,wi) = log p(wi_1,wi). |
Word Segmentation Model | Figure 1: Example lattice with LM projection |
Baseline MT | The LM used for decoding is a log-linear combination of four word n-gram LMs which are built on different English |
Baseline MT | corpora (details described in section 5.1), with the LM weights optimized on a development set and determined by minimum error rate training (MERT), to estimate the probability of a word given the preceding words. |
Experiments | LM1 is a 7-gram LM trained on the tar- |
Experiments | LM2 is a 7-gram LM trained only on the English monolingual discussion forums data listed above. |
Experiments | LM3 is a 4-gram LM trained on the web genre among the target side of all parallel text (i.e., web text from pre-BOLT parallel text and BOLT released discussion forum parallel text). |
Introduction | Optimize name translation and context translation simultaneously and conduct name translation driven decoding with language model ( LM ) based selection (Section 3.2). |
Related Work | enable the LM to decide which translations to choose when encountering the names in the texts (Ji et al., 2009). |
Related Work | The LM selection method often assigns an inappropriate weight to the additional name translation table because it is constructed independently from translation of context words; therefore after weighted voting most correct name translations are not used in the final translation output. |
Related Work | More importantly, in these approaches the MT model was still mostly treated as a “black-box” because neither the translation model nor the LM was updated or adapted specifically for names. |
Discriminative Reranking for OCR | Base features include the HMM and LM scores produced by the OCR system. |
Discriminative Reranking for OCR | Word LM features (“LM-word”) include the log probabilities of the hypothesis obtained using n-gram LMs with n E {1, . |
Discriminative Reranking for OCR | The LM models are built using the SRI Language Modeling Toolkit (Stolcke, 2002). |
Experiments | Our baseline is based on the sum of the logs of the HMM and LM scores. |
Experiments | For LM training we used 220M words from Arabic Gigaword 3, and 2.4M words from each “print” and “hand” ground truth annotations. |
Introduction | The BBN Byblos OCR system (Natajan et al., 2002; Prasad et al., 2008; Saleem et al., 2009), which we use in this paper, relies on a hidden Markov model (HMM) to recover the sequence of characters from the image, and uses an n-gram language model ( LM ) to emphasize the fluency of the output. |
Introduction | For an input image, the OCR decoder generates an n-best list of hypotheses each of which is associated with HMM and LM scores. |
Previous Work | Our LM experiments also affirmed the importance of in-domain English LMs. |
Previous Work | ‘Train LM BLEU oov |
Previous Work | Table 1: Baseline results using the EG and AR training sets with G W and EGen corpora for LM training |
Proposed Methods 3.1 Egyptian to EG’ Conversion | Then we used a trigram LM that we built from the aforementioned Aljazeera articles to pick the most likely candidate in context. |
Proposed Methods 3.1 Egyptian to EG’ Conversion | We simply multiplied the character-level transformation probability with the LM probability — giving them equal weight. |
Proposed Methods 3.1 Egyptian to EG’ Conversion | - SI and $2 trained on the EG’ with EGen and both EGen and GW for LM training respectively. |
Bayesian MT Decipherment via Hash Sampling | 5 A high value for the LM concentration parameter oz ensures that the LM probabilities do not deviate too far from the original fixed base distribution during sampling. |
Decipherment Model for Machine Translation | For P(e), we use a word n-gram language model ( LM ) trained on monolingual target text. |
Experiments and Results | We observe that our method produces much better results than the others even with a 2-gram LM . |
Experiments and Results | With a 3-gram LM , the new method achieves the best performance; the highest BLEU score reported on this task. |
Experiments and Results | Bayesian Hash Sampling with 2—gram LM vocab=full (V6), addjertility=n0 4.2 vocab=pruned*, add_fertility=yes 5.3 |
Experiments | Other features include lexical weighting in both directions, word count, a distance-based RM, a 4-gram LM trained on the target side of the parallel data, and a 6-gram English Gigaword LM . |
Experiments | Some of the results reported above involved linear TM mixtures, but none of them involved linear LM mixtures. |
Experiments | For instance, with an initial Chinese system that employs linear mixture LM adaptation (lin-lm) and has a BLEU of 32.1, adding l-feature VSM adaptation (+vsm, joint) improves performance to 33.1 (improvement significant at p < 0.01), while adding 3-feature VSM instead (+vsm, 3 feat.) |
Introduction | Both were studied in (Foster and Kuhn, 2007), which concluded that the best approach was to combine sub-models of the same type (for instance, several different TMs or several different LMs) linearly, while combining models of different types (for instance, a mixture TM with a mixture LM ) log-linearly. |
Inference with First Order Variables | The language model score h(s’, LM ) of 8’ based on a large web corpus; |
Inference with First Order Variables | wlmflc : VLMh(8/7 LM ) + Z )‘tf(8/7 |
Inference with First Order Variables | wNoun,2,singular : VLMh/(Sl, LM )+ AARTfls’, ART) + APREpfls’, PREP) + ANOUNfls’, NOUN)+ IU’ARTg(S/7 ART) + ,U/PREPg(3/a PREP) ‘i— MNOUNg(8/, NOUN). |
Inference with Second Order Variables | wufl) : VLMh(8/7 LM ) + Z )‘tf(8/7 |
Discussion | 6In our preliminary experiments with the smaller trigram LM , MERT did better on MT05 in the smaller feature set, and MIRA had a small advantage in two cases. |
Experiments | We trained a 4-gram LM on the |
Experiments | In preliminary experiments with a smaller trigram LM , our RM method consistently yielded the highest scores in all Chinese-English tests — up to 1.6 BLEU and 6.4 TER from MIRA, the second best performer. |
The Relative Margin Machine in SMT | 4- gram LM 24M 600M — |
Translation Model Architecture | We find that an adaptation of the TM and LM to the full development set (system “1 cluster”) yields the smallest improvements over the unadapted baseline. |
Translation Model Architecture | For the IT test set, the system with gold labels and TM adaptation yields an improvement of 0.7 BLEU (21.1 —> 21.8), LM adaptation yields 1.3 BLEU (21.1 —> 22.4), and adapting both models outperforms the baseline by 2.1 BLEU (21.1 —> 23.2). |
Translation Model Architecture | TM adaptation with 8 clusters (21.1 —> 21.8 —> 22.1), or LM adaptation with 4 or 8 clusters (21.1 —> 22.4 —> 23.1). |
Experiments | ‘ # 1 Methods | MAP 1 P@ 10 l 1 VSM 0.242 0.226 2 LM 0.385 0.242 3 Jeon et a1. |
Experiments | Row 1 and row 2 are two baseline systems, which model the relevance score using VSM (Cao et al., 2010) and language model ( LM ) (Zhai and Laf-ferty, 2001; Cao et al., 2010) in the term space. |
Experiments | (l) Monolingual translation models significantly outperform the VSM and LM (row 1 and |
Introduction | As a principle approach to capture semantic word relations, word-based translation models are built by using the IBM model 1 (Brown et al., 1993) and have been shown to outperform traditional models (e.g., VSM, BM25, LM ) for question retrieval. |