Index of papers in Proc. ACL 2013 that mention
  • LM
Braune, Fabienne and Seemann, Nina and Quernheim, Daniel and Maletti, Andreas
Decoding
The language model ( LM ) scoring is directly integrated into the cube pruning algorithm.
Decoding
Thus, LM estimates are available for all considered hypotheses.
Decoding
Because the output trees can remain discontiguous after hypothesis creation, LM scoring has to be done individually over all output trees.
LM is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Elfardy, Heba and Diab, Mona
Approach to Sentence-Level Dialect Identification
The aforementioned approach relies on language models ( LM ) and MSA and EDA Morphological Analyzer to decide whether each word is (a) MSA, (b) EDA, (c) Both (MSA & EDA) or (d) OOV.
Approach to Sentence-Level Dialect Identification
The following variants of the underlying token-level system are built to assess the effect of varying the level of preprocessing on the underlying LM on the performance of the overall sentence level dialect identification process: (1) Surface, (2) Tokenized, (3) CODAfied, and (4) Tokenized-CODA.
Approach to Sentence-Level Dialect Identification
Orthography Normalized (CODAfied) LM : since DA is not originally a written form of Arabic, no standard orthography exists for it.
Related Work
Amazon Mechanical Turk and try a language modeling ( LM ) approach to solve the problem.
LM is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann
Comparative Study
Figure 4: bilingual LM illustration.
Comparative Study
4.3 Bilingual LM
Comparative Study
We build a 9-gram LM using SRILM toolkit (Stolcke, 2002) with modified Kneser—Ney smoothing.
Experiments
Our baseline is a phrase-based decoder, which includes the following models: an n-gram target-side language model ( LM ), a phrase translation model and a word-based lexicon model.
Experiments
0 lowercased training data from the GALE task (Table 1, UN corpus not included) alignment trained with GIZA++ o tuning corpus: NIST06 test corpora: NIST02 03 04 05 and 08 o 5-gram LM (1694412027 running words) trained by SRILM toolkit (Stolcke, 2002) with modified Kneser—Ney smoothing training data: target side of bilingual data.
LM is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Hagiwara, Masato and Sekine, Satoshi
Abstract
We propose an online approach, integrating source LM, and / or, back-transliteration and English LM .
Experiments
We observed slight improvement by incorporating the source LM , and observed a 0.48 point F—value increase over baseline, which translates to 4.65 point Katakana F—value change and 16.0% (3.56% to 2.99 %) WER reduction, mainly due to its higher Katakana word rate (11.2%).
Experiments
H. This type of error is reduced by +LM-P, e.g., * 7°?X/fify 7 pumsu chikku “*plus tick” to 703%?‘y7 pumsuchz’kku “plastic” due to LM projection.
Experiments
performance, which may be because one cannot limit where the source LM features are applied.
Introduction
We refer to this process of transliterating unknown words into another language and using the target LM as LM projection.
Introduction
Since the model employs a general transliteration model and a general English LM , it achieves robust WS for unknown words.
Use of Language Model
As we mentioned in Section 2, English LM knowledge helps split transliterated compounds.
Use of Language Model
We use ( LM ) projection, which is a combination of back-transliteration and an English model, by extending the normal lattice building process as follows:
Use of Language Model
Then, edges are spanned between these extended English nodes, instead of between the original nodes, by additionally taking into consideration English LM features (ID 21 and 22 in Table 1): ¢iMP(wi) = IOgPWi) and ¢§Mp(wi—1,wi) = log p(wi_1,wi).
Word Segmentation Model
Figure 1: Example lattice with LM projection
LM is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen
Baseline MT
The LM used for decoding is a log-linear combination of four word n-gram LMs which are built on different English
Baseline MT
corpora (details described in section 5.1), with the LM weights optimized on a development set and determined by minimum error rate training (MERT), to estimate the probability of a word given the preceding words.
Experiments
LM1 is a 7-gram LM trained on the tar-
Experiments
LM2 is a 7-gram LM trained only on the English monolingual discussion forums data listed above.
Experiments
LM3 is a 4-gram LM trained on the web genre among the target side of all parallel text (i.e., web text from pre-BOLT parallel text and BOLT released discussion forum parallel text).
Introduction
Optimize name translation and context translation simultaneously and conduct name translation driven decoding with language model ( LM ) based selection (Section 3.2).
Related Work
enable the LM to decide which translations to choose when encountering the names in the texts (Ji et al., 2009).
Related Work
The LM selection method often assigns an inappropriate weight to the additional name translation table because it is constructed independently from translation of context words; therefore after weighted voting most correct name translations are not used in the final translation output.
Related Work
More importantly, in these approaches the MT model was still mostly treated as a “black-box” because neither the translation model nor the LM was updated or adapted specifically for names.
LM is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Tomeh, Nadi and Habash, Nizar and Roth, Ryan and Farra, Noura and Dasigi, Pradeep and Diab, Mona
Discriminative Reranking for OCR
Base features include the HMM and LM scores produced by the OCR system.
Discriminative Reranking for OCR
Word LM features (“LM-word”) include the log probabilities of the hypothesis obtained using n-gram LMs with n E {1, .
Discriminative Reranking for OCR
The LM models are built using the SRI Language Modeling Toolkit (Stolcke, 2002).
Experiments
Our baseline is based on the sum of the logs of the HMM and LM scores.
Experiments
For LM training we used 220M words from Arabic Gigaword 3, and 2.4M words from each “print” and “hand” ground truth annotations.
Introduction
The BBN Byblos OCR system (Natajan et al., 2002; Prasad et al., 2008; Saleem et al., 2009), which we use in this paper, relies on a hidden Markov model (HMM) to recover the sequence of characters from the image, and uses an n-gram language model ( LM ) to emphasize the fluency of the output.
Introduction
For an input image, the OCR decoder generates an n-best list of hypotheses each of which is associated with HMM and LM scores.
LM is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Sajjad, Hassan and Darwish, Kareem and Belinkov, Yonatan
Previous Work
Our LM experiments also affirmed the importance of in-domain English LMs.
Previous Work
‘Train LM BLEU oov
Previous Work
Table 1: Baseline results using the EG and AR training sets with G W and EGen corpora for LM training
Proposed Methods 3.1 Egyptian to EG’ Conversion
Then we used a trigram LM that we built from the aforementioned Aljazeera articles to pick the most likely candidate in context.
Proposed Methods 3.1 Egyptian to EG’ Conversion
We simply multiplied the character-level transformation probability with the LM probability — giving them equal weight.
Proposed Methods 3.1 Egyptian to EG’ Conversion
- SI and $2 trained on the EG’ with EGen and both EGen and GW for LM training respectively.
LM is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith
Bayesian MT Decipherment via Hash Sampling
5 A high value for the LM concentration parameter oz ensures that the LM probabilities do not deviate too far from the original fixed base distribution during sampling.
Decipherment Model for Machine Translation
For P(e), we use a word n-gram language model ( LM ) trained on monolingual target text.
Experiments and Results
We observe that our method produces much better results than the others even with a 2-gram LM .
Experiments and Results
With a 3-gram LM , the new method achieves the best performance; the highest BLEU score reported on this task.
Experiments and Results
Bayesian Hash Sampling with 2—gram LM vocab=full (V6), addjertility=n0 4.2 vocab=pruned*, add_fertility=yes 5.3
LM is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Kuhn, Roland and Foster, George
Experiments
Other features include lexical weighting in both directions, word count, a distance-based RM, a 4-gram LM trained on the target side of the parallel data, and a 6-gram English Gigaword LM .
Experiments
Some of the results reported above involved linear TM mixtures, but none of them involved linear LM mixtures.
Experiments
For instance, with an initial Chinese system that employs linear mixture LM adaptation (lin-lm) and has a BLEU of 32.1, adding l-feature VSM adaptation (+vsm, joint) improves performance to 33.1 (improvement significant at p < 0.01), while adding 3-feature VSM instead (+vsm, 3 feat.)
Introduction
Both were studied in (Foster and Kuhn, 2007), which concluded that the best approach was to combine sub-models of the same type (for instance, several different TMs or several different LMs) linearly, while combining models of different types (for instance, a mixture TM with a mixture LM ) log-linearly.
LM is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Wu, Yuanbin and Ng, Hwee Tou
Inference with First Order Variables
The language model score h(s’, LM ) of 8’ based on a large web corpus;
Inference with First Order Variables
wlmflc : VLMh(8/7 LM ) + Z )‘tf(8/7
Inference with First Order Variables
wNoun,2,singular : VLMh/(Sl, LM )+ AARTfls’, ART) + APREpfls’, PREP) + ANOUNfls’, NOUN)+ IU’ARTg(S/7 ART) + ,U/PREPg(3/a PREP) ‘i— MNOUNg(8/, NOUN).
Inference with Second Order Variables
wufl) : VLMh(8/7 LM ) + Z )‘tf(8/7
LM is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Eidelman, Vladimir and Marton, Yuval and Resnik, Philip
Discussion
6In our preliminary experiments with the smaller trigram LM , MERT did better on MT05 in the smaller feature set, and MIRA had a small advantage in two cases.
Experiments
We trained a 4-gram LM on the
Experiments
In preliminary experiments with a smaller trigram LM , our RM method consistently yielded the highest scores in all Chinese-English tests — up to 1.6 BLEU and 6.4 TER from MIRA, the second best performer.
The Relative Margin Machine in SMT
4- gram LM 24M 600M —
LM is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sennrich, Rico and Schwenk, Holger and Aransa, Walid
Translation Model Architecture
We find that an adaptation of the TM and LM to the full development set (system “1 cluster”) yields the smallest improvements over the unadapted baseline.
Translation Model Architecture
For the IT test set, the system with gold labels and TM adaptation yields an improvement of 0.7 BLEU (21.1 —> 21.8), LM adaptation yields 1.3 BLEU (21.1 —> 22.4), and adapting both models outperforms the baseline by 2.1 BLEU (21.1 —> 23.2).
Translation Model Architecture
TM adaptation with 8 clusters (21.1 —> 21.8 —> 22.1), or LM adaptation with 4 or 8 clusters (21.1 —> 22.4 —> 23.1).
LM is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun
Experiments
‘ # 1 Methods | MAP 1 P@ 10 l 1 VSM 0.242 0.226 2 LM 0.385 0.242 3 Jeon et a1.
Experiments
Row 1 and row 2 are two baseline systems, which model the relevance score using VSM (Cao et al., 2010) and language model ( LM ) (Zhai and Laf-ferty, 2001; Cao et al., 2010) in the term space.
Experiments
(l) Monolingual translation models significantly outperform the VSM and LM (row 1 and
Introduction
As a principle approach to capture semantic word relations, word-based translation models are built by using the IBM model 1 (Brown et al., 1993) and have been shown to outperform traditional models (e.g., VSM, BM25, LM ) for question retrieval.
LM is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: